Skip to main content
Horticulture Research logoLink to Horticulture Research
. 2024 Nov 21;12(3):uhae330. doi: 10.1093/hr/uhae330

Phytop: a tool for visualizing and recognizing signals of incomplete lineage sorting and hybridization using species trees output from ASTRAL

Hong-Yun Shang 1,2,3,#, Kai-Hua Jia 4,#, Nai-Wei Li 5,#, Min-Jie Zhou 6,7,8, Hao Yang 9,10,11, Xiao-Ling Tian 12, Yong-Peng Ma 13,14,, Ren-Gang Zhang 15,16,17,
PMCID: PMC11879507  PMID: 40046323

Abstract

Incomplete lineage sorting (ILS) and introgression/hybridization (IH) are prevalent in nature and thus frequently result in discrepancies within phylogenetic tree topologies, leading to misinterpretation of phylogenomic data. Despite the availability of numerous tools for detecting ILS and IH among species, many of these tools lack effective visualization, or are time-consuming, or require prior predetermination. Here, we addressed these shortcomings by developing a fast-running, user-friendly tool called Phytop. By defining ILS and IH indices to quantify ILS and IH, this tool can detect the extent of ILS and IH among lineages with high reliability, and can visualize them based on the gene tree topology patterns constructed using ASTRAL. We tested Phytop extensively using both simulated and real data, and found that it enables users to quickly and conveniently estimate the extent of ILS and IH, thus clarifying the phylogenetic uncertainty. Phytop is available at https://github.com/zhangrengang/phytop and is expected to contribute to the intuitive and convenient inference of genetic relationships among lineages in future research.

Introduction

Understanding the process of speciation is crucial for elucidating phylogenetic relationships, yet it is often complicated by phenomena such as incomplete lineage sorting (ILS) and introgression/hybridization (IH) [1–8]. These processes must be carefully considered, as they frequently result in conflicting phylogenetic tree topologies, posing significant challenges to researchers [9]. For example, the phylogenetic relationships among species within the superrosids, a large clade of angiosperms, often exhibit inconsistencies across different datasets and methods, with these inconsistencies being largely attributed to ILS [10, 11]. Similarly, introgression or hybridization, particularly in cases of certain species of hybrid origin, can lead to numerous conflicts between gene trees constructed for species and their parental lineages, as is observed in cases such as Juglans regia and J. sigillata [12], the Ericales [13], Papaver [14], Fragaria × ananassa [15], Aegilops tauschii [16], and Triticum turgidum [16].

With the growth of phylogenomics data, many tools have been developed to infer ILS or IH, including HyDe [17], MSCquartets [18], theta/Reticulation Index [6], Dsuite [19], BEAST [20], PhyloNet [21], PhyloNetworks [22], and BPP [23]. Two main classes of methods have been adopted by existing tools to detect hybridization. The D-statistic (also known as Patterson's D or ABBA-BABA test) [24] is one of the most well-known hybridization detection methods. Based on the D-statistic, HyDe [17], and Dsuite [19] can be used to infer hybridization from summary statistics calculated from site patterns of population or species quartets. However, the accuracy of these methods can be reduced by ancient hybridization, which violates the infinite sites assumption [25]. Another major way of detecting hybridization is to estimate a phylogenetic network for the detection of IH. Most of the software that adopts this approach, including NANUQ [26] in MSCquartets [18], InferNetwork_MPL [27] and InferNetwork_ML [28] in PhyloNet [21], and SNaQ [29] in PhyloNetworks [22], uses likelihood methods or Bayesian inference, based on gene tree topologies or subtree topologies. This method is generally better suited to genomic datasets with multiple taxa but is computationally costly, and the information derived solely from gene tree topologies may not be sufficient to accurately distinguish the direction of gene flow [30, 31]. The full likelihood method makes comprehensive use of gene tree topologies and branch lengths, allowing for a more accurate identification of various gene flow scenarios, but it comes with a significantly increased computational burden [30, 32]. Methods using this approach include MCMC-SEQ [33] in PhyloNet [21], SpeciesNetwork in BEAST [20], and BBP [23]. Although the full likelihood method is a more efficient approach for the detection of gene flow and performs better in distinguishing different introgression scenarios, its substantial computational load limits the application when dealing with a large number of taxa [30]. Thus, it is generally recommended to use a strategy that initially uses the fast but less accurate methods to come up with alternative hypotheses, and then to use the more accurate but slow methods (e.g. BPP) to test the hypotheses [30].

An additional problem is that none of the abovementioned tools provide effective visualization to intuitively display both ILS and IH among lineages, which discourages researchers from analyzing evolutionary networks in a more simplified manner. Tools like PhyParts [34] and DiscoVista [35] offer visualization of the gene tree discordance but do not provide quantifiable explanations. Furthermore, most of these tools require substantial programming skills and cannot quickly and conveniently provide quantitative estimates of ILS and IH, which is essential for subsequent analyses. Therefore, in practice, there is an urgent need for a fast-running tool that can quantify both ILS and IH and facilitate visualization to help optimize research steps. Such a tool would enable researchers to prioritize the exclusion of lineages with few phylogenetic conflicts, and then further analyze hard-to-resolve lineages using a combination of more robust methods, saving significant amounts of time and improving efficiency.

In this study, we developed Phytop, a simple and intuitive visualization tool that reflects the heterogeneity within phylogenetic species trees inferred using ASTRAL [36]. Phytop is freely available on github (https://github.com/zhangrengang/phytop). By visualizing the proportions and metrics of inferred topological structures, Phytop not only enables visualization of gene tree discordance along with each node of a species tree, but also provides interpretable measures of ILS and IH with high reliability. Even for phylogenetic trees comprising many lineages, Phytop can complete the computational process in a very short time, addressing the limitations of current tools. Moreover, Phytop, which visualizes the results of the popular tool ASTRAL for species tree inference [36], is easy to initiate and to understand, allowing for convenient, quick, and effective detection of the ILS and IH signals among lineages.

Results

Definition of the ILS and IH indices and expectations arising from these indices

We developed two indices for visually assessing ILS and IH signals in a species tree. We take a rooted species tree of three taxa as an example. The three taxa are the first child (or left child, L), the second child (or right child, R), and the sister group (S) (Fig. 1a). The rooted gene trees of the three taxa have at most three topologies: ((L, R), S), ((L, S), R), and ((S, R), L), with respective proportions of q1, q2 and q3 (q1 + q2 + q3 = 1) (Fig. 1b). Under an ILS-only scenario (Fig. 1a), q2 = q3 is expected. When IH occurs with no ILS, for example an IH event from S to L (Fig. 1a), we expect q2 > > q3. For ease of understanding, we defined two indices to separately quantify the strength of ILS and IH signals. The first, named the ‘ILS index’, is scaled to a range of 0% to 100% and reflects the strength of ILS. When the ILS index is at its maximum of 100% and there is no IH, we expect that q1 = q2 = q3 = 100%/3 (Fig. 1c). The second measure, abbreviated as the ‘IH index’, is essentially equivalent in concept to the previously defined inheritance probability (γ), gene flow rate (λ), or hybridization index [17, 37]. This index ranges from 0% to 50%. When the IH index reaches 50% and there is no ILS, q1 = q2 = 50% is expected (Fig. 1d). When both ILS and IH are present, the proportions of q1, q2, and q3 correspond to the different ILS index and IH index values, as shown in Fig. 1e. Under these expectations, the two indices can be therefore estimated through the patterns of q1, q2, and q3 (output from ASTRAL) and visualized using our Phytop tool. For the computational methods, please refer to the section Methods.

Figure 1.

Figure 1

Theoretical expectations of the ILS and IH indices, taking the three species S, L, and R as an example. (a) The rooted species tree ([S, [L, R]]) and gene trees for ILS-only and IH-only scenarios. There are three and two gene tree topologies under ILS-only and IH-only scenarios, respectively. (b) The three possible topological structures of the gene trees, with the first topology identical to that of species tree. The proportions of the three topologies are denoted q1, q2, and q3. (c) Patterns of q1, q2, and q3 under an ILS-only scenario with different ILS index values. (d) Patterns of q1, q2, and q3 under an IH-only scenario with different IH index values. (e) Patterns of q1, q2, and q3 when both ILS and IH are present with different ILS index and IH index values.

The performance of the indices in a simple hybridization model

We constructed simulated data to evaluate the reliability of the ILS and IH indices. In the simple hybridization model, lineage 1 served as the outgroup and lineage 3 was a hybrid originating from lineages 2 and 4, as illustrated in Fig. 2a (for all the phylogenetic trees constructed by ASTRAL in this study, Phytop can be completed in just a few minutes). We set the ILS index to range from 0 to 1, with increments of 0.1, and set the IH index to range from 0 to 0.5, with increments of 0.05. For example, when we set the IH index to 0.4 and the ILS index to 0.368, we observed an IH index of 40.5% and an ILS index of 36.1% using Phytop (Fig. 2a), which was highly consistent with the expected values. For the full simulated data, the observed ILS index was essentially equal to the expected ILS index, with little variation, and was essentially unaffected by different setting of the IH index (Fig. 2b). The observed IH index was found to align for the most part with the expected IH index, although the variations were large at high ILS levels and IH was sometimes even undetectable (Fig. 2c). For instance, when the ILS index was set to 1, the observed IH indices were zero across the board, and IH was thus completely undetectable (Fig. 2c). However, this is somewhat expected, because the high ILS significantly reduces the relative difference between q2 and q3. This issue still challenges current tools and needs to be addressed in the future.

Figure 2.

Figure 2

Evaluation of the IH and ILS indices in simulated data under a simple hybridization model. (a) We used fastsimcoal2 to simulate data for the simple hybridization model. In this model, lineage 3 is a hybrid resulting from hybridization between lineages 2 and 4 (left). L and H represent the ILS index and IH index set in the simulation, respectively. A more detailed description of the model parameters can be found in the Methods. An example of assessment of the degree of ILS and IH in the simulated data using the ILS and IH indices is shown (right). n represents the number of simulated gene trees, P is the P-value of the χ2 test to check whether the number of topologies q2 and q3 are equal, ILS-i and IH-i represent the calculated ILS and IH indices, respectively, and ILS-e and IH-e represent the proportion of gene tree topological incongruence that can be explained by ILS and IH, respectively (see Methods). (b) The relationship between the expected ILS index and the observed ILS index under different settings of the IH index in simulated data. The shallows indicate 95% CI of the observed ILS index. (c) The relationship between the expected IH index and the observed IH index under different settings of the ILS index in simulated data. The shallows indicate 95% CI of the observed IH index. (d) The distribution of the observed ILS index and the observed IH index in all simulated data. (e) The proportions of observed true topologies under different settings of the ILS index and IH index in simulated data.

As the ILS levels increased, the variations in the observed IH index increased, and the detectability of the IH decreased (Fig. 2c and d). Thus, with ILS levels increasing, a low IH index appears to be undetectable, and the boundary at which IH is undetectable increases exponentially with increasing ILS levels (Fig. 2d). In addition, the error rates of the observed species tree topology increased when either the ILS index or the IH index is high and close to their maximum values (Fig. 2d and e). However, this is also expected, because the three equivalent topologies are expected when ILS index reaches 100%, and two equivalent topologies are expected when the IH index reaches 50% (Fig. 1c and d), which leads to different topologies from the expected species tree topology. In summary, the observed IH index was found to be greatly affected by the ILS level, while the observed ILS index was almost totally unaffected by IH. The performances of the indices met our expectations using simulated data and suggest they are reliable.

In addition, we simulated different species divergence times under the simple hybridization model. The performance of the observed ILS and IH indices did not change obviously with the simulated divergence times varying from 1e3 to 1e8 generations (Figs S1 and S2). This suggested robustness for both ancient and recent hybridization, which may be expected for the gene tree topology-based methods.

Since ASTRAL assumes independent evolutionary histories for genes and its accuracy decreases for linked gene trees [38, 39], we simulated data under different recombination rates (1e-10 to 1e-6 per generation) under the simple hybridization model to better reflect the empirical data. At different recombination rates, the observed ILS indices were still consistent with the expected ILS index. However, at low recombination rates, the width of the 95% confidence interval (CI) for the observed ILS index significantly increased (Fig. 3a), indicating a higher variation than when based on independent gene trees. Similarly, the variation of the observed IH indices was high when recombination rates were low, although the observed IH indices were essentially consistent with the expected IH index at low ILS (Fig. 3b). In particular, when low recombination rates were combined with high ILS, the observed IH indices did not reflect the real IH (Fig. 3b). In other words, low recombination rates could lead to false IH signals under high ILS, even where no IH occurred. In addition, similar to our findings in simulated data without recombination, as ILS levels increased, the detectability of IH decreased and the error rates in the observed species tree topology increased. However, this decline in robustness was more serious in low recombination rates (Fig. S3). These results indicated that both low recombination rates and high ILS seriously impact the IH index, so the veracity of IH signals should be carefully considered at such scenarios.

Figure 3.

Figure 3

Evaluation of the IH and ILS indices in simulated data using a simple hybridization model with different recombination rates (the number in each gray box). (a-b) The distribution of the observed ILS index (a) and the observed IH index (b) in all simulated data. The shallows indicate 95% CI of the observed ILS or IH indices.

The performance of the IH and ILS indices in complex reticulation models

Although simple evolutionary models are seen in nature, complex reticulate models of evolution are perhaps more common in biological processes. We simulated several reticulate evolution models to investigate their effects on gene tree frequency patterns and our two test indices, as shown in Fig. 4. For reticulation models 1–2 (with a single hybridization event), significant IH index values were observed in multiple nodes, although the hybridization branch was set for only one node (Fig. 4a and b). For reticulation models 3–4, where two hybridization events occur dependently, significant IH index values were also observed in multiple nodes (Fig. 4c and d), reflecting complexity and difference from the expected IH index value. None of the complex models resulted in an accurate estimation of IH indices. However, the detection of significant IH index value can reflect the presence of IH that needs to be further investigated using network-based methods, such as PhyloNet.

Figure 4.

Figure 4

Evaluation of the IH and ILS indices in simulated data using complex reticulation models, with a single hybridization event in reticulation models 1–2 (a and b) and with two successive hybridization events in reticulation models 3–4 (c and d). The figures on the left show a schematic diagram of the model, and those on the right figure show the phylogenetic relationships and indicators of the simulated data. All H and T branches are set to IH index = 0.4 and ILS index = 0.368, respectively.

The performance of the IH and ILS indices in empirical data

We next evaluated Phytop using several previously reported cases of hybridization or introgression in Juglans regia and J. sigillata [12], the Ericales [13], Aegilops tauschii [16], and Triticum turgidum [16]. The ancestors of J. regia and J. sigillata are thought to have originated through hybridization between the American and Asian lineages [12]. We reconstructed the phylogenetic relationships among J. regia, J. sigillata, J. mandshurica (Asian lineage), and J. nigra (American lineage). We found the ILS index of 48.1% and the IH index of 44.2% at the node (J. nigra, J. regia + J. sigillata), reflecting a potential hybridization event involving both the American and Asian lineages into the ancestors of J. regia + J. sigillata (Fig. 5a). This is consistent with the results from previous studies [12] and further demonstrates the reliability of the ILS and IH indices in simple hybridization models.

Figure 5.

Figure 5

Evaluation of the ILS and IH indices in empirical cases of reticulate evolution. (a) Evaluation of the ILS and IH indices in the case of reticulate evolution within Juglans: J. regia, J. sigillata, J. mandshurica (Asian lineage), and J. nigra (American lineage). (b) Evaluation of the ILS and IH indices in the case of reticulate evolution among the Ericales, the Cornales and the Gentianidae. (c) Evaluation of the ILS and IH indices in the wheat D lineage (Ae. tauschii), and the A (T. urartu) and B (Ae. speltoides) lineages. (d) Evaluation of the ILS and IH indices in the allotetraploid emmer wheat (T. turgidum; AABB), and its progenitors: the A (T. urartu) and B (Ae. speltoides) lineages.

The Ericales are inferred to have originated from hybridization between the Cornales and the Gentianidae [13]. We selected genomic data from the Cornales, the Gentianidae, and the Ericales to reconstruct a phylogenetic species tree of these groups. The resulting IH index at the internal branch of ((Ericales, Cornales), Gentianidae) reached 14.6% (Fig. 5b), which is essentially consistent with the degree of introgression (22.3%) found in previous study [13]. This indicated potential gene introgression from the Gentianidae to the Ericales, supporting the previous hypothesis regarding a hybrid origin of the Ericales [13]. However, considering that there was also an high IH signal at the adjacent node (the common ancestor of the Cornales, the Gentianidae, and the Ericales), a more complex reticulate model should be considered, according to our simulation of complex reticulate models (Fig. 4).

The wheat D lineage is hypothesized to originate from hybridization between the A and B lineages [16]. Using Oryza sativa and Hordeum vulgare as outgroups, we reconstructed the phylogenetic relationships among Aegilops tauschii (DD), Ae. speltoides (BB), and Triticum urartu (AA). We found that the IH index at the internal branch of (Ae. speltoides, (Ae. tauschii, T. urartu)) was 39.0% (Fig. 5c), which is consistent with the degree of gene introgression (39–57%) estimated in previous studies [16]. Furthermore, we found that the ILS index was 88%, indicating a strong level of ILS among the three lineages.

The allotetraploid emmer wheat (Triticum turgidum; AABB) is thought to have originated from the hybridization of the A and B lineages [16]. The phylogenetic relationships among Ae. speltoides (BB), T. urartu (AA), and T. turgidum were reconstructed (Fig. 5d). Similar to the case with the D lineage, we found that both the IH index and the ILS index at the internal branch of (Ae. speltoides, (T. urartu, T. turgidum)) were at high levels (IH-i = 39.8%), which is consistent with the allotetraploid nature of T. turgidum potentially originating from the hybridization between Ae. speltoides and T. urartu.

These empirical cases suggest the reliability and accuracy of the IH index in detecting simple IH in real biological data. At the very least, the IH index is competitive with previously developed methods of inference and enables simple visualization.

The performance of the ILS and IH indices in large biological datasets

We next explored whether ILS and IH signals were widely present in the major clade of plants (angiosperms and gymnosperms). Angiosperms, which form the most advanced class in the plant kingdom, are the most widely distributed group of modern plants [40–42]. We reconstructed species trees for several representative groups of angiosperms, including the early angiosperms, the core eudicots, the asterids, and the rosids.

In the early angiosperm groups, the phylogenetic relationships among magnoliids, monocots, Chloranthales, eudicots, and Ceratophyllales have been controversial [2, 40, 41, 43–45]. In these groups, we found that ILS and IH signals were widespread and high (ILS-i ranged from 47.8% to 89.8%, IH-i ranged from 0% to 48.7%; Fig. 6a). Our results support monocots as the earliest divergent group, a sister of the monophyletic group comprising the other lineages (Fig. 6a). This is consistent with the nuclear tree and plastid tree of Hu et al. [43] and the nuclear trees of Ma et al. [2] and Guo et al. [45], but conflicts with the plastid tree of PPA II [41] and the nuclear tree of Qin et al. [46]. In this study, the eudicots clade was sister to Ceratophyllales, which is consistent with the nuclear tree and plastid tree of Hu et al., [43] and the plastid tree of PPA II [41]. However, the Kew Tree of Life Explorer (https://treeoflife.kew.org) [44] and a recent study including approximately 60% of angiosperm genera [40] support the Ceratophyllales as the earliest diverging group, which is the sister to a monophyletic group composed of the other lineages. These results suggest that the early diversification of the angiosperms may have involved complex reticulation. Similar to previous cases, the ILS and IH indices are sensitive indicators that can reflect these factors effectively.

Figure 6.

Figure 6

Detection of ILS and IH signals in representative groups of angiosperms and gymnosperms. Detection of the ILS and IH indices in the nodes in (a) early angiosperm groups, (b) early core eudicot groups, (c) the asterids, (d) the rosids, and (e) gymnosperm groups.

In the early diverging core eudicots, we found very strong ILS and IH signals among lineages, reflecting the severe ILS and IH events within the early diversification of core eudicots (Fig. 6b). The phylogenetic position of the Caryophyllales is also controversial in different studies [40, 41, 43, 44]. This study supports the position of the Caryophyllales as sister to the asterids, but correspondingly, the ILS and IH signals of these groups were high (Fig. 6b).

In the asterids group, ILS and IH were also prevalent. In our study, the Garryales was sister to the monophyletic group comprising the Solanales, Gentianales, and Lamiales, which was consistent with Hu et al. [43] and PPA II [41], but which conflicts with Zuntini et al. [40] and the Kew Tree of Life Explorer [44]. In addition, we found high ILS and IH signals (ILS-i = 81.1%, IH-i = 42.2%; Fig. 6c) between orders within the campanulids, including the Dipsacales, Asterales, and Apiales. The phylogenetic relationships among these three orders have also been inconsistent in previous studies [40, 41, 43, 44].

In the rosids groups, we also found widespread evidence of ILS and IH (Fig. 6d). The malvids and the fabids can both form monophyletic groups, with low ILS and no IH signal at their crown nodes (ILS-i = 32.4% at the crown node of the malvids and ILS-i = 49.4% at the crown node of the fabids; Fig. 6d). However, the relationships of the orders within the falvids or the fabids were not consistent [10, 11, 40, 41, 43, 44], where the ILS and IH indices were high (ILS-i ranged from 53.6% to 94.9%, IH-i ranged from 0% to 45.4%; Fig. 6d).

In gymnosperms, the phylogenetic position of the Gnetales has been much debated, with several varying hypotheses supported by different studies [47–50]. Our results support placing the Gnetales as sister to the Pinales (Fig. 6e). Moreover, ILS and IH signals were pervasive and strong (ILS-i ranged from 73.6% to 76.3%, IH-i ranged from 19.1% to 35.2%) across the major clades of gymnosperms (Fig. 6e), suggesting that complex reticulation events may have occurred among the main lineages of the gymnosperms (Fig. 6e) and necessitating further research.

In our study, it can be seen from the index that despite the high support, considerable ILS and IH signals were present in the species tree (Fig. 6). By visualizing the indices developed in this study using Phytop, these issues can be observed and further addressed to help interpret the data and infer the correct phylogenetic relationships.

Discussion

Understanding the history of life requires the establishment of well-resolved phylogenetic species trees, where the tree's inner nodes with high support are considered to be the final topology [51]. In general, assessment of support for phylogenetic trees relies on statistical support for bipartitions, i.e. the nonparametric bootstrap proportion (BP) or the posterior probability (PP). Higher BP and PP values close to 100% reflect a low level of uncertainty in that node of the phylogenetic tree and are considered to be high support of that topology. However, with the development of sequencing technology in recent years, there has been an exponential increase in the amount of phylogenomic data produced, together with a decrease in random errors. This has led to high support often being obtained, but this does not necessarily truly reflect the heterogeneity within the data [52].

Furthermore, even where nodes are well supported, conflicting species tree topologies can be obtained for the same group in different studies, such as in our study (Fig. 6) and previous studies [9–11, 43, 52]. ILS and IH are thought to be largely the cause of these phenomena [4, 53]. Here, we defined the ILS and IH indices to quantify the degree of ILS and IH, which can in turn be another assessment indicator for the support of a phylogenetic tree. Based on simulated data, our rough guidelines are that a node with ILS index < 50% and IH index <10% (or more strictly, ILS index < 30% and IH index < 5%) can be considered as well resolved with high confidence. By defining the ILS and IH indices, well-resolved and hard-to-resolve nodes can be conveniently identified based on the ILS and IH signals, within phylogenetic species trees constructed by ASTRAL [36], which is a popular tool family in the phylogenomic era.

With the functionality provided by Phytop, researchers can quickly predict ILS/IH among species from phylogenetic trees constructed in ASTRAL [36] and can adopt a divide-and-conquer strategy for the inference of phylogenetic networks. Well-resolved nodes with both high support and low ILS and IH indices can be first excluded, and then researchers can focus on analyzing the phylogenetic networks of species involved in inner nodes with high ILS and/or IH indices. This is undoubtedly very important because this strategy greatly reduces the complexity and time-consuming processes in the operation of existing phylogenetic network analysis tools (such as PhyloNetworks [22] and BPP [23]), and is necessary for further elucidating the evolutionary relationships among a large number of taxa. The Phytop method provides simple and effective visual identification for ILS/IH, allowing researchers to interpret the uncertainty in species tree topologies fully and simply, and guiding researchers to further transition to a network framework. Moreover, Phytop is easy to operate and requires only a short run time, meaning that it is easily mastered by users.

Under a simple hybridization model, using either simulated or real data, we were able to infer reticulate evolutionary relationships based on the ILS and the IH indices. Particularly in cases of simple hybrid origin, we found that the IH index obtained was highly consistent with results from previous empirical studies [12, 13, 16], which indicates the reliability of these indices in simple hybridization models. Although in complex reticulate evolutionary models, constrained by the framework of a binary tree, it can be difficult to accurately discern reticulate evolutionary relationships using the IH index, the IH signal can nevertheless inform us that reticulate evolution is present, and that further in-depth study with the network-based methods is necessary. This is a very important point: tree building, as a simple and effective method, is favored by researchers, but assessment of the potential presence of ILS/IH has often been neglected for a tree-based interpretation. Moreover, due to widespread ILS/IH, different species tree topologies are often obtained in different studies [10, 11], leading to conflict and confused interpretations of phylogenetic relationships. The visualization made possible by Phytop could provide a straightforward evaluation for the uncertain or hard-to-resolve nodes (such as the (Apiales, Asterales) node with a high ILS or IH in Fig. 6c) for further in-depth study.

In practice, we analyzed the ILS and IH signals among the major groups of angiosperms and gymnosperms, and found that there were numerous nodes with high ILS and/or IH signals, and the phylogenetic relationships of the lineages at these nodes often show conflicts in previous studies. Although the reasons for these conflicts are complex, the strong ILS and IH signals (ILS-i ranged from 47.8% to 94.9%, IH-i ranged from 0% to 49.7%; Fig. 6) suggest that their relationships became inconsistent likely due to both IH and ILS. Furthermore, high IH signals are often found in adjacent nodes, such as those among the Oxalidales, Brassicales, Celastrales, Malpighiales, and the Sapindales+Malvales (Fig. 6d). These patterns are similar to those seen in simulated data under complex hybridization models (Fig. 4). This means that introgression events among these adjacent groups might be intricate and cannot be inferred as simple hybridization events; instead, complex network should be considered to further resolve their relationships. In angiosperms, although many branches were in conflict, some branches with low ILS and IH signals can be considered well resolved. For example, given the low ILS and IH indices in the crown node of the monocots (ILS-i = 20%, IH-i = 0%; Fig. 6a) and the crown node of the eudicots (ILS-i = 7.2%, IH-i = 2.6%; Fig. 6a), the branches of both monocots and eudicots can be considered well resolved. Future studies could exclude these well-resolved nodes and focus more on those branches with large conflicts. Phytop can thus be used to identify the well-resolved and conflicting nodes, indicating directions for future research on the conflicting nodes. Additionally, some nodes only exhibit high ILS signals, for example, the internal branch of ((Solanales, Gentianales), Lamiales) (ILS-i = 92.9%, IH-i = 0%; Fig. 6c), where ILS is likely to be the main cause of the conflicts. However, this should still be verified, as our simulations indicated that the IH signal may be overridden by the high ILS. In addition, it is worth mentioning that the Solanales, Gentianales, and Lamiales form a monophyletic group, which is found consistently in previous studies [40, 41, 43, 44]. Correspondingly, both the ILS and IH indices in the crown node of ((Solanales, Gentianales), Lamiales) were low (ILS index = 13.8%, IH index = 0%; Fig. 6c). The relationship of the campanulids, lamiids, and Ericales+Cornales was also stable across studies [40, 43, 44], and the corresponding node also had a low ILS and IH indices in our study (ILS index = 23.3%, IH index = 0%; Fig. 6c), which further reflects the reliability of the indices as an indicator for the support of the phylogenetic tree. Our indices reveal the ILS and IH signals at conflicting locations, providing insights and guides into further reticulation research.

Nevertheless, IH index have shortcomings when distinguishing between alternative scenarios. Take the quartet (((L, R), S), O) in Fig. 1 as an example. In this scenario, when there is IH signaling in the (L, R) node, the IH index alone cannot rule out the possibility that the IH was derived from the outgroup O or another ‘ghost’ (either extinct or unsampled) lineage (i.e. ghost introgression). In fact, inflow, outflow, and ghost introgression cannot be distinguished based solely on gene tree topology, as they can lead to similar patterns of topology [30]. Even in a simple hybridization model, we can only make a simple inference about the direction and proportion of gene flow from the topology patterns and IH index, and this is only a hypothetical inference. Where the patterns of topology have been confused by alternative introgression scenarios, the IH index has no ability to distinguish them. Therefore, the use of a full likelihood method such as BPP [23] is recommended after using Phytop to screen for lineages with significant IH signals. Both topology and branch lengths of the gene trees are used in BPP to robustly detect ghost introgression and to disentangle different scenarios of gene flow between nonsister species, which cannot be accomplished using only topology-based methods [30].

Methods

Definition of ILS index and IH index in theory

Under the multispecies coalescent (MSC) model (only ILS), the two minor topologies (q2 and q3) have equal probability Inline graphic, where t is the coalescent time [9]. We define the ILS index as Inline graphic, so that when t = 0, it reaches 100% and the three topologies have equal probability (1/3), and the ILS index drops to <1% when t > 4.6. This index is more simple to understand than the coalescent time t. Given an ILS index value (denoted as L), the probability of a minor topology is Inline graphic and the coalescent time is Inline graphic. This index can therefore reflect the strength (0–100%) of ILS and the minor topology frequencies (0–0.33) of gene tree discordance.

When asymmetrical IH occurs, we define the IH index as the proportion contributed by the minor donor lineage. Given an IH index value (denoted as H), a lineage is considered to be admixed by H of one parental lineage and Inline graphic of the other parental lineage. This index is also simple to understand, reflecting the proportion (0%–50%) of the genome inherited from the minor parental lineage.

Estimation of ILS and IH indices

Given the frequencies q1, q2 and q3 (let 1 ≥ q1 ≥ q2 ≥ q3 ≥ 0; q1 + q2 + q3 = 1) of the three topologies in the species tree inferred by ASTRAL, we use a chi-square test to test the goodness-of-fit for the multispecies coalescent model [9]. If the P value is greater than 0.05, indicating that there is no significant difference between the observed and expected topologies, we say the topology frequencies can be well explained by only ILS (ILS can explain all the Inline graphic discordant gene trees, so ILS-e = Inline graphic). In this case, we use Inline graphic to represent the minor topology frequencies, and the ILS index (ILS-i) is estimated using the formula:

graphic file with name DmEquation1.gif

Otherwise (P ≤ 0.05), we say the smallest q3 can be explained by ILS (ILS can explain Inline graphic discordant gene trees, so ILS-e = Inline graphic) and the imbalance between q2 and q3 can be explained by IH (IH can explain Inline graphic discordant gene trees, so IH-e =Inline graphic). Thus, we use q3 to represent the minor topology frequencies from ILS, and estimate ILS index (ILS-i) using the formula:

graphic file with name DmEquation2.gif

Furthermore, ILS itself is expected to produce three topologies with equal probability (q3). Thus, with q3 all excluded, the IH index (IH-i) is estimated using the formula:

graphic file with name DmEquation3.gif

Note that other factors, such as horizontal gene transfer and gene duplication and loss, are not addressed in our assumption. Nevertheless, the symmetries in gene tree distributions can be addressed using the ILS index, and the asymmetries can be addressed using the IH index. In addition, it is worth noting that the two indices are calculated under a simple hybridization scenario (Fig. 2a) and may not be accurate for other scenarios (Fig. 4).

Simulation data

We simulated a mixed ILS + IH model using fastsimcoal2 [54]. Population effective sizes (Ne) were fixed to 1e5; T0 = T1 = 1e6 generations, the substitution rate = 1e-8 per generation, the recombination rate = 0 per generation, the locus length was set to 2000 bp; the IH index was set from 0% to 50% with a step of 5%, and the ILS index was set from 0 to 100% with a step of 10%, producing 121 (Inline graphic) combinations. The generation time (T) of internal branches was converted from the ILS index (L) using the formula: Inline graphic. Each scenario was simulated (‘fsc27 -n 10 000 -T’ at each time) to generate 10 000 independent gene trees. Then, the species tree was inferred using ASTRAL-Pro2 [36] and the two indices (i.e. the ILS index and the IH index) were calculated based on the observations of gene tree frequencies. The above procedure was repeated independently 1000 times to generate percentile-based 95% CIs for the two indices. In addition, T1 was set ranging from 1e3 to 1e8 generations to access the effect of species divergence times.

Furthermore, to access the accuracy for dependent gene trees, we simulated varying recombination rates ({1e-10, 1e-9, 1e-8, 1e-7, 1e-6}). By generating 10 000 linked subtrees with fastsimcoal2, we repeated the above procedure to calculate the two indices for each setting of recombination rates.

Real data

Five real biological datasets were used in this study:

  1. The wheat data: Marcussen et al. [16] revealed a hybrid origin of the D lineage from the A and B lineages in the wheat genome. The allotetraploid AABB lineage is thought to have originated from the hybridization of the A and B lineages. Here, we used the genomic data from T. urartu (AA), Aegilops speltoides (a close relative of BB), T. turgidum (AABB), and Ae. tauschii (DD) to test these inferences. Hordeum vulgare and Oryza sativa were used as the outgroups.

  2. The walnut data: Zhang et al. [12] demonstrated that Juglans regia (and its landrace J. sigillata) arose as a hybrid between the American and the Asian lineages. Here, we used the genomic data from J. regia, J. sigillata, J. mandshurica (Asian lineage), and J. nigra (American lineage) to test this inference. Carya illinoinensis was used as the outgroup.

  3. The Ericales data: Stull et al. [13] suggested the possibility that the Ericales was a reticulate lineage resulting from hybridization between ancestors of the Cornales (78%) and the Gentianidae (22%). Here, we used the genomic data from the Cornales, the Ericales, and the Gentianidae from our previous study [55] to test this inference. Arabidopsis thaliana, Malania oleifera, Vitis vinifera, and Oryza sativa were used as the outgroups.

  4. The angiosperm data: in addition to the above well-documented data, we also used genomic data from the angiosperms to test our method. This dataset included 45 orders and 45 families of angiosperms, as well as two outgroups (Amborella trichopoda and Nymphaea colorata).

  5. The gymnosperm data: we selected representative gymnosperms to construct a species tree with Physcomitrella patens as the outgroup.

Inferring species tree from biological data with ASTRAL

Orthologs were inferred using Orthofinder [56] for the above real data. MCScanX [57] was used to extract gene collinearity. Single-copy gene families were extracted from the collinearity results, allowing up to 50% missing taxa. Protein sequences in each family were aligned using MAFFT [58] and the alignments were converted into codon alignments using PAL2NAL [59]. The alignments were trimmed using trimAl, with the parameter ‘-automated1’ that is optimized for maximum likelihood (ML) phylogenetic tree reconstruction [60]. Then the ML phylogenetic trees were reconstructed by using the software IQ-TREE [61] with the automatically selected best-fit model [62] and 1000 bootstrap replicates [63]. The gene trees were then re-rooted using the Newick utilities [64]. The above procedure for reconstructing gene trees was incorporated in the SOI pipeline [65]. Based on the single-copy gene family trees, ASTRAL-Hybrid [66] was used to infer the species tree with quartet supports (i.e. counts of the three topologies, or q1, q2, and q3) which is weighted by gene tree uncertainty. A complete workflow of sample data is available at https://github.com/shang-hongyun/phytop_workflow.

Visualizing the species tree

We visualized the frequencies of the three topologies using bar charts and integrated the species tree illustration with the Python programming library ete3 [67]. Two indices, the ILS index and the IH index, were calculated (see above) to quantify the strength of ILS and IH signals. These steps were done with Phytop (https://github.com/zhangrengang/phytop) and only took a few minutes for a tree with dozens of species.

Supplementary Material

Web_Material_uhae330

Acknowledgements

We thank the two anonymous reviewers for their insightful comments and constructive suggestions, which have significantly improved the quality of manuscript and enhance the utility of Phytop. This study was financial support by the strategic priority research program of Kunming Institute of Botany, Chinese Academy of Sciences (KIBXD202401), Natural Science Foundation of China (32471734 and 32360336), CAS ‘Light of West China’ Program, Yunnan Provincial Science and Technology Mission (202404BI090014).

Contributor Information

Hong-Yun Shang, Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; University of Chinese Academy of Sciences, Beijing 101408, China.

Kai-Hua Jia, Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Jinan, Shandong 250100, China.

Nai-Wei Li, Jiangsu Key Laboratory for the Research and Utilization of Plant Resources, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing 210014, China.

Min-Jie Zhou, Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; University of Chinese Academy of Sciences, Beijing 101408, China.

Hao Yang, Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; University of Chinese Academy of Sciences, Beijing 101408, China.

Xiao-Ling Tian, College of characteristic plant research, Guiyang Institute of Humanities and Technology, Guiyang 550025, China.

Yong-Peng Ma, Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China.

Ren-Gang Zhang, Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; University of Chinese Academy of Sciences, Beijing 101408, China.

Author Contributions

R. Z. and Y. M. conceived and designed the study. R. Z. performed software development. H. S. and R. Z. wrote the manuscript. R. Z. and H. S completed the data analysis and prepared the figures. R. Z., Y. M., N. L., and K. J. revised the manuscript. M. Z., X. T., and H. Y. tested the software and helped with data analysis.

Data availability

The code for Phytop is freely available on GitHub (https://github.com/zhangrengang/phytop).

Conflict of interest statement

The authors declare no conflicts of interest.

References

  • 1. Wang  Z, Kang  M, Li  J. et al.  Genomic evidence for homoploid hybrid speciation between ancestors of two different genera. Nat Commun. 2022;13:1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Ma  J, Sun  P, Wang  D. et al.  The Chloranthus sessilifolius genome provides insight into early diversification of angiosperms. Nat Commun. 2021;12:6929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Guo  C, Luo  Y, Gao  LM. et al.  Phylogenomics and the flowering plant tree of life. J Integr Plant Biol. 2023;65:299–323 [DOI] [PubMed] [Google Scholar]
  • 4. Feng  S, Bai  M, Rivas-González  I. et al.  Incomplete lineage sorting and phenotypic evolution in marsupials. Cell. 2022;185:1646–1660.e18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Rose  JP, Toledo  CAP, Lemmon  EM. et al.  Out of sight, out of mind: widespread nuclear and plastid-nuclear discordance in the flowering plant genus Polemonium (Polemoniaceae) suggests widespread historical gene flow despite limited nuclear signal. Syst Biol. 2021;70:162–80 [DOI] [PubMed] [Google Scholar]
  • 6. Cai  L, Xi  Z, Lemmon  EM. et al.  The perfect storm: gene tree estimation error, incomplete lineage sorting, and ancient gene flow explain the most recalcitrant ancient angiosperm clade, Malpighiales. Syst Biol. 2021;70:491–507 [DOI] [PubMed] [Google Scholar]
  • 7. Soltis  PS, Soltis  DE. The role of hybridization in plant speciation. Annu Rev Plant Biol. 2009;60:561–88 [DOI] [PubMed] [Google Scholar]
  • 8. Payseur  BA, Rieseberg  LH. A genomic perspective on hybridization and speciation. Mol Ecol. 2016;25:2337–60 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Degnan  JH, Rosenberg  NA. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol. 2009;24:332–40 [DOI] [PubMed] [Google Scholar]
  • 10. Wang  Z, Zhou  J, Pan  J. et al.  Insights into the superrosids phylogeny and flavonoid synthesis from the telomere-to-telomere gap-free genome assembly of Penthorum chinense Pursh. Hortic Res. 2024;11:uhad274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Liu  L, Chen  M, Folk  RA. et al.  Phylogenomic and syntenic data demonstrate complex evolutionary processes in early radiation of the rosids. Mol Ecol Resour. 2023;23:1673–88 [DOI] [PubMed] [Google Scholar]
  • 12. Zhang  B, Xu  LL, Li  N. et al.  Phylogenomics reveals an ancient hybrid origin of the persian walnut. Mol Biol Evol. 2019;36:2451–61 [DOI] [PubMed] [Google Scholar]
  • 13. Stull  GW, Soltis  PS, Soltis  DE. et al.  Nuclear phylogenomic analyses of asterids conflict with plastome trees and support novel relationships among major lineages. Am J Bot. 2020;107:790–805 [DOI] [PubMed] [Google Scholar]
  • 14. Zhang  RG, Lu  C, Li  GY. et al.  Subgenome-aware analyses suggest a reticulate allopolyploidization origin in three Papaver genomes. Nat Commun. 2023;14:2204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Zhang  RG, Shang  HY, Jia  KH. et al.  Subgenome phasing for complex allopolyploidy: case-based benchmarking and recommendations. Brief Bioinform. 2024;25:bbad513 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Marcussen  T, Sandve  SR, Heier  L. et al.  Ancient hybridizations among the ancestral genomes of bread wheat. Science. 2014;345:1250092. [DOI] [PubMed] [Google Scholar]
  • 17. Blischak  PD, Chifman  J, Wolfe  AD. et al.  HyDe: a python package for genome-scale hybridization detection. Syst Biol. 2018;67:821–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Allman  ES, Mitchell  JD, Rhodes  JA. Gene tree discord, simplex plots, and statistical tests under the coalescent. Syst Biol. 2022;71:929–42 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Malinsky  M, Matschiner  M, Svardal  H. Dsuite - fast D-statistics and related admixture evidence from VCF files. Mol Ecol Resour. 2021;21:584–95 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Zhang  C, Ogilvie  HA, Drummond  AJ. et al.  Bayesian inference of species networks from multilocus sequence data. Mol Biol Evol. 2018;35:504–17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Wen  D, Yu  Y, Zhu  J. et al.  Inferring phylogenetic networks using PhyloNet. Syst Biol. 2018;67:735–40 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Solís-Lemus  C, Bastide  P, Ané  C. PhyloNetworks: a package for phylogenetic networks. Mol Biol Evol. 2017;34:3292–8 [DOI] [PubMed] [Google Scholar]
  • 23. Flouri  T, Jiao  X, Rannala  B. et al.  A bayesian implementation of the multispecies coalescent model with introgression for phylogenomic analysis. Mol Biol Evol. 2020;37:1211–23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Green  RE, Krause  J, Briggs  AW. et al.  A draft sequence of the neandertal genome. Science. 2010;328:710–22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Kong  S, Kubatko  LS. Comparative performance of popular methods for hybrid detection using genomic data. Syst Biol. 2021;70:891–907 [DOI] [PubMed] [Google Scholar]
  • 26. Allman  ES, Baños  H, Rhodes  JA. NANUQ: a method for inferring species networks from gene trees under the coalescent model. Algorithms Mol Biol. 2019;14:24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Yu  Y, Nakhleh  L. A maximum pseudo-likelihood approach for phylogenetic networks. BMC Genomics. 2015;16:S10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Yu  Y, Dong  J, Liu  KJ. et al.  Maximum likelihood inference of reticulate evolutionary histories. Proc Natl Acad Sci. 2014;111:16448–53 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Solís-Lemus  C, Ané  C. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet. 2016;12:e1005896 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Pang  X, Zhang  D. Detection of ghost introgression requires exploiting topological and branch length information. Syst Biol. 2024;73:207–22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Stull  GW, Pham  KK, Soltis  PS. et al.  Deep reticulation: the long legacy of hybridization in vascular plant evolution. Plant J. 2023;114:743–66 [DOI] [PubMed] [Google Scholar]
  • 32. Hibbins  MS, Hahn  MW. Phylogenomic approaches to detecting and characterizing introgression. Genetics. 2022;220:iyab173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Wen  D, Nakhleh  L. Coestimating reticulate phylogenies and gene trees from multilocus sequence data. Syst Biol. 2018;67:439–57 [DOI] [PubMed] [Google Scholar]
  • 34. Smith  SA, Moore  MJ, Brown  JW. et al.  Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants. BMC Evol Biol. 2015;15:150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Sayyari  E, Whitfield  JB, Mirarab  S. DiscoVista: interpretable visualizations of gene tree discordance. Mol Phylogenet Evol. 2018;122:110–5 [DOI] [PubMed] [Google Scholar]
  • 36. Zhang  C, Mirarab  S. ASTRAL-pro 2: ultrafast species tree reconstruction from multi-copy gene family trees. Bioinformatics. 2022;38:4949–50 [DOI] [PubMed] [Google Scholar]
  • 37. Glémin  S, Scornavacca  C, Dainat  J. et al.  Pervasive hybridizations in the history of wheat relatives. Sci Adv. 2019;5:eaav9188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. He  W, Scornavacca  C, Chan  Y. The accuracy of species tree inference under gene tree dependence. bioRxiv. 2024; 2024.06.06.597697
  • 39. Feng  C, Wang  J, Liston  A. et al.  Recombination variation shapes phylogeny and introgression in wild diploid strawberries. Mol Biol Evol. 2023;40:msad049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Zuntini  AR, Carruthers  T, Maurin  O. et al.  Phylogenomics and the rise of the angiosperms. Nature. 2024;629:843–50 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Li  H, Luo  Y, Gan  L. et al.  Plastid phylogenomic insights into relationships of all flowering plant families. BMC Biol. 2021;19:232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Govaerts  R, Nic Lughadha  E, Black  N. et al.  The world checklist of vascular plants, a continuously updated resource for exploring global plant diversity. Sci Data. 2021;8:215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Hu  H, Sun  P, Yang  Y. et al.  Genome-scale angiosperm phylogenies based on nuclear, plastome, and mitochondrial datasets. J Integr Plant Biol. 2023;65:1479–89 [DOI] [PubMed] [Google Scholar]
  • 44. Baker  WJ, Bailey  P, Barber  V. et al.  A comprehensive phylogenomic platform for exploring the angiosperm tree of life. Syst Biol. 2022;71:301–19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Guo  X, Fang  D, Sahu  SK. et al.  Chloranthus genome provides insights into the early diversification of angiosperms. Nat Commun. 2021;12:6930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Qin  L, Hu  Y, Wang  J. et al.  Insights into angiosperm evolution, floral development and chemical biosynthesis from the Aristolochia fimbriata genome. Nat Plants. 2021;7:1239–53 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Ran  J. et al.  Phylogenomics resolves the deep phylogeny of seed plants and indicates partial convergent or homoplastic evolution between Gnetales and angiosperms. Proc R Soc B Biol Sci. 1881;2018:20181012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Gitzendanner  MA, Soltis  PS, Wong  GKS. et al.  Plastid phylogenomic analysis of green plants: a billion years of evolutionary history. Am J Bot. 2018;105:291–301 [DOI] [PubMed] [Google Scholar]
  • 49. Song  C, Fu  F, Yang  L. et al.  Taxus yunnanensis genome offers insights into gymnosperm phylogeny and taxol production. Commun Biol. 2021;4:1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Liu  Y, Wang  S, Li  L. et al.  The Cycas genome and the early evolution of seed plants. Nat Plants. 2022;8:389–401 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Shen  X, Hittinger  CT, Rokas  A. Contentious relationships in phylogenomic studies can be driven by a handful of genes. Nat Ecol Evol. 2017;1:0126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Thomson  RC, Brown  JM. On the need for new measures of phylogenomic support. Syst Biol. 2022;71:917–20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Zhang  D, Rheindt  FE, She  H. et al.  Most genomic loci misrepresent the phylogeny of an avian radiation because of ancient gene flow. Syst Biol. 2021;70:961–75 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Laurent  E, Matthieu  F. Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics. 2011;27:1332–4 [DOI] [PubMed] [Google Scholar]
  • 55. Ma  H, Liu  Y, Liu  D. et al.  Chromosome-level genome assembly and population genetic analysis of a critically endangered rhododendron provide insights into its conservation. Plant J. 2021;107:1533–45 [DOI] [PubMed] [Google Scholar]
  • 56. Emms  DM, Kelly  S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Wang  Y, Tang  H, DeBarry  JD. et al.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Standley  DM, Katoh  K. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Suyama  M, Torrents  D, Bork  P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Capella-Gutierrez  S, Silla-Martinez  J, Gabaldon  T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Nguyen  LT, Schmidt  HA, von Haeseler  A. et al.  IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32:268–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Kalyaanamoorthy  S, Minh  BQ, Wong  TKF. et al.  ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14:587–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Hoang  DT, Chernomor  O, von Haeseler  A. et al.  UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol. 2017;35:518–22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Junier  T, Zdobnov  EM. The Newick utilities: high-throughput phylogenetic tree processing in the Unix shell. Bioinformatics. 2010;26:1669–70 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Zhang  RG, Shang  HY, Zhou  MJ. et al.  Robust identification of orthologous synteny with the Orthology index and its applications in reconstructing the evolutionary history of plant genomes. bioRxiv. 2024; 2024.08.22.609065
  • 66. Zhang  C, Mirarab  S. Weighting by gene tree uncertainty improves accuracy of quartet-based species trees. Mol Biol Evol. 2022;39:msac215 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Huerta-Cepas  J, Serra  F, Bork  P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol. 2016;33:1635–8 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web_Material_uhae330

Data Availability Statement

The code for Phytop is freely available on GitHub (https://github.com/zhangrengang/phytop).


Articles from Horticulture Research are provided here courtesy of Oxford University Press

RESOURCES