Skip to main content
Genome Biology and Evolution logoLink to Genome Biology and Evolution
. 2024 Oct 30;16(10):evae229. doi: 10.1093/gbe/evae229

Challenges in Assembling the Dated Tree of Life

Carlos G Schrago 1,, Beatriz Mello 2
Editor: Laura Katz
PMCID: PMC11523137  PMID: 39475308

Abstract

The assembly of a comprehensive and dated Tree of Life (ToL) remains one of the most formidable challenges in evolutionary biology. The complexity of life's history, involving both vertical and horizontal transmission of genetic information, defies its representation by a simple bifurcating phylogeny. With the advent of genome and metagenome sequencing, vast amounts of data have become available. However, employing this information for phylogeny and divergence time inference has introduced significant theoretical and computational hurdles. This perspective addresses some key methodological challenges in assembling the dated ToL, namely, the identification and classification of homologous genes, accounting for gene tree-species tree mismatch due to population-level processes along with duplication, loss, and horizontal gene transfer, and the accurate dating of evolutionary events. Ultimately, the success of this endeavor requires new approaches that integrate knowledge databases with optimized phylogenetic algorithms capable of managing complex evolutionary models.

Keywords: phylogenomics, horizontal gene transfer, multispecies coalescent, timescales, networks, reticulation


Significance.

Assembling a dated phylogeny of life is essential for understanding Earth's biodiversity and its evolutionary history. However, traditional models of vertical inheritance of genes are inadequate to fully describe the intricacies of the evolution of species. Advances in genome sequencing and computational phylogenetics offer unprecedented opportunities to explore these issues, but they also introduce new challenges in data management and software implementation. Addressing these obstacles is critical for developing a more accurate and complete representation of the ToL.

This Perspective is part of a series of articles celebrating 40 years since Molecular Biology and Evolution was founded. It is accompanied by virtual issues on this topic published by Genome Biology and Evolution and Molecular Biology and Evolution, which can be found at our 40th anniversary website.

Introduction

In 1837, Charles Darwin sketched the famous “I think” diagram on page 36 of his Notebook B. This branching depiction of evolution, now iconic, has become a cornerstone of modern evolutionary research. In The Origin of Species, he went as far as to suggest that every species is an endpoint of an unbroken ancestor-descendant stream of inheritance with modifications linking all organisms, forming a vast Tree of Life (ToL). This imagery has inspired biologists ever since, but until the advent of nucleotide and amino acid sequencing, the ToL could not be effectively investigated (Woese and Fox 1977). To celebrate the 40th anniversary of SMBE, this perspective explores the challenges of reconstructing the evolutionary relationships of life from the standpoint of computational and theoretical phylogenetics.

It is now clear that the entire history of cellular life is far more complex than a simple branching tree. The genome of every organism contains information derived from processes other than vertical transmission (Sagan 1967; Blais and Archibald 2021). Although this phenomenon was initially considered unique to prokaryotes, it is now widely accepted that parts of eukaryotic genomes also originated through horizontal gene transfer (HGT) (Husnik and McCutcheon 2018; Keeling 2024). Furthermore, endosymbiotic events have been pervasive throughout evolution (Sagan 1967; Bennett et al. 2024). These nonvertical transfers and mergers of genetic information make it impossible to represent life's phylogeny as a fully bifurcating tree. In fact, the ToL may reflect a small fraction of the genome (Dagan and Martin 2006) or serve as a statistical representation of the evolutionary history (O’Malley and Koonin 2011).

Therefore, one of the foremost challenges in constructing the ToL is perhaps determining how to accurately depict the genealogy of all organisms (Corel et al. 2016). This conceptual problem has been actively discussed for years (Doolittle 1999; Blais and Archibald 2021). It has been further complicated by the ample recognition that the history of genomic segments might not match the diversification of species (Maddison 1997). Additionally, because the ToL depicts only the history of cellular life, the issue of how to relate this representation with the origin and evolution of viruses and other acellular replicators has attracted attention. Although the evolutionary origins of the viral distinctive replication machinery remain unclear (Krupovic et al. 2019), many viral genes are evolutionarily related to genes found in cellular life (Koonin et al. 2015). Therefore, although arguments were made to exclude viruses from the ToL (Moreira and López-García 2009), links between the ToL and the virosphere can be established.

Fueled by the increasing affordability of genome sequencing, our understanding of deep phylogenetic relationships has expanded significantly in recent decades (Spang et al. 2022; Eme and Tamarit 2024). The growing significance of the topic is underscored by the number of published papers referencing the ToL, which is increasing at a faster rate than those mentioning phylogeny broadly (Fig. 1). Here, we outline key methodological challenges biologists face in assembling the ToL, including (i) the various components of data processing; (ii) phylogenetic inference, including reconciling gene tree-species tree discordance and accounting for reticulations; (iii) the realistic modeling of nucleotide and amino acid substitutions; (iv) inferring accurate timescales and devising models of rate evolution on lineages; and (v) addressing the computational demands posed by big data (Fig. 2).

Fig. 1.

Fig. 1.

Comparative counts of the number of studies mentioning “phylogeny” and “ToL” retrieved from the Scopus database for the period 1980-2023. Both linear regression models have R2 > 0.95 and P < 0.001.

Fig. 2.

Fig. 2.

Major challenges for assembling the dated ToL.

Gathering and Organizing Sequence Data

The rapid pace of genome and metagenome sequencing is unprecedented and continues to accelerate. As more sequences become available, the critical task of filtering, classifying, and organizing this overwhelming volume of raw data cannot be overlooked (Stephens et al. 2015). Several databases have been developed to facilitate data assembly for phylogenetic studies, focusing on both unicellular and multicellular organisms (e.g. Goodstein et al. 2012; Uchiyama et al. 2019; Harrison et al. 2024). These databases, which include curated datasets and advanced phylogenomic pipelines, have allowed researchers to explore the ToL in greater depth, offering valuable resources for comparative studies (Cerón-Romero et al. 2019; Hernández-Plaza et al. 2023).

To investigate life's diversification, the standard approach involves compiling data sets of single-copy orthologs (SCOs), which are presumed to be predominantly vertically transmitted. This strategy, however, significantly limits the number of genes used, narrowing it to ∼50 when considering all living organisms (Moody et al. 2024). As we move from the last universal common ancestor (LUCA) toward the leaves of the ToL, the amount of orthologous sequence data employed to estimate its various subtrees increases, generating an imbalance. Even when studies focus on the same taxonomic level, besides sampling distinct terminal taxa, the genes used often differ (Petitjean et al. 2014; Zhu et al. 2019). Inferring SCOs for lineages separated by more than 3 billion years is difficult. Even if these genes are reliably identified, determining site-wise homology in deeply diverged sequences through alignment algorithms is far from straightforward (Landan and Graur 2009).

Because alignments are themselves estimates—a fact often neglected—phylogenetic methods have been developed to infer both alignments and trees simultaneously (Suchard and Redelings 2006) or even allow for alignment-free inference (Balaban et al. 2022). However, the effectiveness of these methods in improving the reconstruction of the ToL remains uncertain. After aligning sequences, a common step in phylogenomic pipelines is trimming positions with low phylogenetic information (Capella-Gutiérrez et al. 2009; Steenwyk et al. 2020), although this practice has been shown to reduce phylogenetic accuracy in some cases (Tan et al. 2015). For amino acid sequences, when alignments are unreliable due to high sequence dissimilarity, the structural conformation of proteins may provide evolutionary information for phylogenetic analysis (Bujnicki 2000; Malik et al. 2020). Data processing and classification algorithms are expected to continue improving, revealing previously unknown homology associations (Pavlopoulos et al. 2023).

From Gene Genealogies to the ToL

Aside from the challenges of identifying homologous genes, theoreticians have demonstrated that the standard approach of concatenating individual alignments into supermatrices for phylogenetic reconstruction can lead to biased tree topology estimates (Kubatko and Degnan 2007). Alternatively, topological heterogeneity along the genome, arising from the independent evolutionary histories of unlinked genomic segments (“gene trees”), provides valuable information for inferring the species tree (i.e. phylogeny). Discordance between the species tree and gene trees is due to various factors, including statistical errors caused by limited sampling or model misspecification (Delsuc et al. 2005), as well as genuine biological processes (Degnan and Rosenberg 2009). Distinguishing between these causes is not straightforward. Most advancements in theoretical and computational phylogenetics over the past two decades have focused on developing methods that accommodate the various biological factors causing this topological discordance (Edwards 2009).

The first major theoretical breakthrough in this field was the development of the multispecies coalescent (MSC) model, which extends Kingman's coalescent (Kingman 1982) from single species to gene genealogies across multiple species. The MSC provides theoretical expectations for the distribution of gene trees and phylogenetic parameters, enabling the assessment of incomplete lineage sorting (ILS) (Rannala and Yang 2003; Degnan and Salter 2005). Although this model was introduced in the 1980s (Tajima 1983; Pamilo and Nei 1988), interest in this approach only surged with the advent of sequencing technologies that allowed for the assembly of multigene datasets. Currently, phylogenetic inference using MSC has become routine. However, its implementation is computationally intensive, and full parametric inference remains prohibitive in most cases, even within the Bayesian framework using Markov chain Monte Carlo (MCMC) methods (Heled and Drummond 2010). This has already prompted, and will likely continue to drive, the development of more efficient summary methods (Mirarab et al. 2014).

Although MSC accounts for ILS, it addresses only one of the causes contributing to topological mismatches between gene genealogies and the species phylogeny. Gene duplication and loss (DL), along with HGT, are other common processes generating gene-species tree discrepancy, both inducing nonorthologous relationships between sequences (Fig. 3). Therefore, to expand the number of genes used to assemble the ToL beyond SCOs, paralogy and nonvertical transmission found in gene families must be addressed. In such cases, gene trees contain nodes unrelated to between-species divergence, and these must be reconciled with the species tree using methods that either incorporate explicit models of gene family evolution accounting for DL and HGT (Szöllősi et al. 2013) or rely on parsimony (Bayzid and Warnow 2018). Recent methodological approaches use alignments of entire gene families to estimate the species phylogeny, thereby overcoming the limitations of relying exclusively on SCOs (Boussau et al. 2013; Morel et al. 2024).

Fig. 3.

Fig. 3.

Biological processes that generate gene tree-species tree discordance. Open circles indicate coalescent events. (a) ILS can lead to discrepancies between the gene tree and the species tree; (b) HGT, followed by the fixation of the transferred allele, results in a genealogy that differs from the species tree; (c) Ancestral gene duplication, marked by a filled black circle, and subsequent differential loss of the duplicated copies contribute to discordance between the gene tree and the species phylogeny.

Ideally, phylogenetic inference with reconciliation should incorporate the MSC process in addition to considering duplication, loss, and transfer (DLT). However, co-estimating species and gene trees using an MSC + DLT model is complex (Mirarab et al. 2021). Consequently, efforts have primarily focused on estimating phylogenies under MSC while at least considering DL (Rasmussen and Kellis 2012; Zhang et al. 2020). Recent developments model the MSC process within networks, incorporating introgression between lineages and thereby creating branches where nonvertical transfer of information has occurred (Wen et al. 2018). While introgression and HGT are different phenomena, they both involve nonvertical transmission. Thus, networks provide a theoretical foundation for integrating DLT with MSC into a comprehensive inferential framework.

Although promising, networks introduce substantial conceptual changes. Biologists are accustomed to reading the evolutionary history through acyclic graphs, making the interpretation of reticulations more difficult. Moreover, statistical evaluation of trees typically relies on branch support values that require summarizing topologies, such as the bootstrap and the Bayesian posterior probabilities. Network summarization is computationally more demanding, necessitating adjustments to these traditional metrics (Solís-Lemus et al. 2017). These difficulties pave the way for future research and methodological innovations.

Most phylogenetic algorithms produce unrooted tree topologies, leaving the temporal direction undefined. In fact, one of the most reported challenges across the ToL is determining the root of various lineages. Tree rooting is typically achieved using outgroups, but this approach becomes even more problematic when addressing the LUCA, as no living outgroups exist. In such cases, rooting can be achieved using gene families that duplicated before the divergence of all extant cellular life (Gogarten et al. 1989). However, since duplicate copies may be unavailable and establishing paralog groups is also prone to error, alternative methods can be applied (Tria et al. 2017). Finally, tree rooting can also be implemented using DLT reconciliation methods (Williams et al. 2017; Coleman et al. 2021). Moreover, recently proposed phylogenetic inference algorithms accounting for DLT yield rooted tree topologies without requiring outgroups (Morel et al. 2022, 2024). Correct root placement is crucial for resolving key questions related to the evolution of major lineages, including Eukarya (Cerón-Romero et al. 2022), Archaea (Petitjean et al. 2014), and Bacteria (Coleman et al. 2021).

Modeling Sequence Evolution

Regardless of the phylogenetic method, tree inference from alignments relies on explicit nucleotide or amino acid substitution models. Biologically, these models must account for numerous biochemical and life-history changes that genomes have undergone over billions of years and countless rounds of DNA replication. Developing realistic models of sequence evolution is a difficult task, and it underlies several unresolved issues related to the ToL (reviewed by Williams et al. 2021). When models are misspecified, systematic errors resulting in long-branch attraction (Felsenstein 1978) and other phylogenetic anomalies might emerge. Because of the massive number of sampled sites in genomic studies, biased estimates with seemingly low uncertainty are frequently obtained.

For improving sequence evolution models, two main factors are recurrently cited: allowing for evolutionary rate heterogeneity and incorporating multiple site-specific profiles. Rate heterogeneity among sequence sites is typically addressed using a discretized Gamma distribution (Yang 1994), allowing the site likelihood to be summed over a predefined number of rate categories over the entire tree. This approach assumes that, while among-site variation exists, site-specific rates are homogeneous across the tree branches (Goloboff et al. 2019). For inferring the deep divergences of the ToL, this assumption may be unrealistic and should ideally be reconsidered (Tuffley and Steel 1998; Galtier 2001).

In addition to rate variation, accounting for different substitution profiles and compositional heterogeneity among sites also improves the accuracy of phylogenetic relationships between species that diverged deeper in time (Kapli et al. 2021). This can be achieved by partitioning data in advance or using mixture substitution models that integrate the likelihood function across different compositional categories (Lartillot and Philippe 2004). Unlike partition models, mixture models offer the advantage of not requiring predefined site-to-model assignments. Addressing both rate and substitution profile heterogeneity simultaneously requires more advanced models (Crotty et al. 2020). However, the benefit of greater complexity is counterbalanced by the downside of an expanded parameter space, which raises the time needed for analysis (Wang et al. 2018). Researchers are thus actively developing approximate solutions to balance accuracy and computational efficiency (e.g. Banos et al. 2024).

Estimating the Timescale of Life

As new lineages are discovered and added to the ToL, establishing the timeline of species diversification is a key step in uncovering the major patterns of evolution (Hedges et al. 2015). To convert branch lengths, measured in units of substitutions per site, into units of absolute time, additional information on either time, substitution rates, or both is required. Temporal information is primarily derived from fossils, while substitution rates are sourced from empirical estimates (e.g. Bergeron et al. 2023) or borrowed from previous molecular dating studies. Since the early 2000s, timescale estimation has been conducted predominantly within a Bayesian framework (Kishino et al. 2001). In this approach, branch lengths are decomposed using an explicit model of how substitution rates change from branch to branch, combined with calibration information via probabilistic distributions (Dos Reis et al. 2016; Bromham et al. 2018).

Although several subtrees of the ToL have abundant fossil records for calibrating node ages, these lineages are mostly younger than 500 million years, and they are heavily biased toward multicellular organisms with hard tissues. To improve the precision of age estimates across the entire phylogeny of life, fossil calibrations from single-celled organisms older than one billion years are required. However, such findings are rare, and only a few fossils have been used for this purpose (see detailed paleontological justifications in Moody et al. 2024). One key result from recent timescales is that the age of the LUCA is approximately 4 billion years, which is close to the maximum bound set by the Moon-forming impact (Betts et al. 2018; Kumar et al. 2022; Craig et al. 2023).

Dating the ToL can be achieved using genes that originated from duplication events pre-LUCA. Although very few gene families meet this criterion, this information can be leveraged to cross-brace the nodes of the phylogeny, effectively doubling the number of calibrated nodes (Shih and Matzke 2013; Mahendrarajah et al. 2023; Moody et al. 2024). When incorporated into a model of gene family evolution, HGT can also inform divergence times, as this process occurs between lineages that coexisted in the past (Szöllosi et al. 2012; Davín et al. 2018; Wolfe and Fournier 2018). While node calibration has been the standard procedure in dating the ToL, Bayesian divergence time priors can be adjusted to account for the constraints imposed by HGT (Szöllõsi et al. 2022).

Regarding evolutionary rate variation across the ToL, Bayesian prior distributions used for modeling substitution rate evolution are generally classified as either autocorrelated or uncorrelated (reviewed by Mello and Schrago 2024). When constructing a dated ToL, it is biologically expected that rate autocorrelation between branches would diminish over more than 3 billion years of evolution. However, due to the vast time span involved, even uncorrelated models–which assume a single underlying prior distribution for substitution rates along the phylogeny–might be inadequate. As a result, studies often report node age estimates using several rate models (Betts et al. 2018; Moody et al. 2024).

Due to computational limitations, a full Bayesian inference of the dated ToL, which involves estimating both the phylogeny and timescale, is rarely implemented. The typical approach is to fix the tree topology for estimating divergence times using the MCMC algorithm. To speed up computations, approximations of the tree likelihood can be employed (Kishino et al. 2001; Reis and Yang 2011). Fixing the tree topology, however, may reduce the impact of nucleotide and amino acid substitution models on the estimates of branch lengths and, consequently, divergence times (Tao et al. 2020).

Integrative Approaches and Fast Methods

The recent data deluge has presented many computational challenges in assembling the universal dated phylogeny of organisms. Since it is impractical to re-estimate the entire ToL each time a new genome becomes available, alternative strategies are necessary (Kramer et al. 2023). One obvious solution is to combine the results from the hundreds of independent phylogenetic analyses published each month in specialized journals, thereby parallelizing the assembly of the ToL and pooling the efforts of researchers worldwide. Such integrative approaches require acquiring, curating, and developing new analytical tools for data assembly and statistical evaluation of the resulting megaphylogeny.

Integrative data acquisition and curation have been implemented in many databases (e.g. Kumar et al. 2022), but fewer studies have addressed the best methods for combining information from multiple sources, summarizing, and potentially evaluating the various parameters of the ToL (Truszkowski et al. 2023; Sánchez Reyes et al. 2024). This is crucial because many clades of the ToL are understudied, leading to biased sampling in the published phylogenies that contribute to the construction of subtrees. Ideally, these knowledge databases should consider and weigh the factors influencing phylogenetic inference, such as sequence length, taxonomic sampling, and the confidence metrics associated with the original trees (e.g. topological support for branches and credibility intervals for divergence times). Collecting this information would be much easier if published phylogenies were required to be stored in public, free-of-charge databases in standardized, machine-readable formats, similar to how nucleotide and amino acid sequences are uploaded to repositories. In case dated phylogenies are to be estimated de novo, methods that optimize computational time are essential, and benchmarking studies comparing the relative performance of time-consuming and faster alternatives are invaluable in the genomic age (Mello et al. 2017; Barba-Montoya et al. 2021; Costa et al. 2022).

Conclusion and Future Developments

Computational and theoretical phylogenetics has advanced significantly since the early works of the 1960s, and newer approaches are essential for estimating the deep evolutionary relationships in the ToL. Nevertheless, several fundamental questions remain unanswered: Is there sufficient empirical evidence to support the view that vertical transmission events primarily drive the evolution of life? Or might a reticulated phylogeny provide a more accurate depiction of life's deep evolutionary history? If the latter, how can we test competing hypotheses, interpret, and assign statistical support to the branches (edges) of such networks? Additionally, divergence time estimation would benefit from accounting for nonvertical events. In the widely used Bayesian framework, the adoption of priors that consider cyclic graphs has been introduced, offering promising avenues for integrating multiple sources of genetic transmission in dating the ToL (Flouri et al. 2020).

It is safe to say that the coming years will witness a series of methodological advancements that address many of these questions. As in other areas of evolutionary biology, future developments will likely be driven by the influx of high-quality genomes resulting from advances in sequencing technologies. Also in the technical domain, the impressive performance of machine learning algorithms and AI are likely to play a significant role in collecting and organizing phylogenomic data and improving pipelines (e.g, Kirilenko et al. 2023). To formulate testable hypotheses about the history of life using sequence data, biologists must tackle the management of data overflow–from storage to analysis–and the energy demands required throughout the process (Kumar 2022). This environmentally sustainable framework for conducting bioinformatic research is called green computing (Grealey et al. 2022).

These new standards should be considered by both theoreticians and software developers, as analytic resources are limited and unevenly distributed within the scientific community. Overcoming these obstacles is a crucial step toward the democratization of evolutionary science. Fully assembling the dated ToL will require substantial efforts in sequencing and analysis of unsampled species, particularly those from biodiversity hotspots primarily located in the Global South. Therefore, the active involvement of the local scientific community in this endeavor should be encouraged. In this regard, scientific societies have a central role, especially in promoting initiatives that address this issue, such as those implemented by SMBE (Eyre-Walker and Katz 2024).

Acknowledgments

C.G.S. is supported by CNPq grants 409963/2023-2, 401725/2022-7, and 309165/2019-9. B.M. is supported by Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ) grants E-26/211.248/2019, and E-26/201.446/2022; and by CNPq grant 311231/2022-5.

Contributor Information

Carlos G Schrago, Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.

Beatriz Mello, Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.

Data Availability

No new data were generated or analyzed in support of this research.

Literature Cited

  1. Balaban M, Bristy NA, Faisal A, Bayzid MS, Mirarab S. Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model. Bioinform Adv. 2022:2(1):vbac055. 10.1093/bioadv/vbac055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Banos H, Wong TKF, Daneau J, Susko E, Minh BQ, Lanfear R, Brown MW, Eme L, Roger AJ.. GTRpmix: a linked general time-reversible model for profile mixture models. Mol Biol Evol. 2024:41(9):msae174. 10.1093/molbev/msae174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Barba-Montoya J, Tao Q, Kumar S. Assessing rapid relaxed-clock methods for phylogenomic dating. Genome Biol Evol. 2021:13(11):evab251. 10.1093/gbe/evab251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bayzid MS, Warnow T. Gene tree parsimony for incomplete gene trees: addressing true biological loss. Algorithms Mol Biol. 2018:13(1):1. 10.1186/s13015-017-0120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bennett GM, Kwak Y, Maynard R. Endosymbioses have shaped the evolution of biological diversity and complexity time and time again. Genome Biol Evol. 2024:16(6):evae112. 10.1093/gbe/evae112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bergeron LA, Besenbacher S, Zheng J, Li P, Bertelsen MF, Quintard B, Hoffman JI, Li Z, St Leger J, Shao C, et al. Evolution of the germline mutation rate across vertebrates. Nature. 2023:615(7951):285–291. 10.1038/s41586-023-05752-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Betts HC, Puttick MN, Clark JW, Williams TA, Donoghue PCJ, Pisani D.. Integrated genomic and fossil evidence illuminates life's early evolution and eukaryote origin. Nat Ecol Evol. 2018:2(10):1556–1562. 10.1038/s41559-018-0644-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Blais C, Archibald JM. The past, present and future of the tree of life. Curr Biol. 2021:31(7):R314–R321. 10.1016/j.cub.2021.02.052. [DOI] [PubMed] [Google Scholar]
  9. Boussau B, Szöllosi GJ, Duret L, Gouy M, Tannier E, Daubin V.. Genome-scale coestimation of species and gene trees. Genome Res. 2013:23(2):323–330. 10.1101/gr.141978.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bromham L, Duchêne S, Hua X, Ritchie AM, Duchêne DA, Ho SYW.. Bayesian molecular dating: opening up the black box. Biol Rev Camb Philos Soc. 2018:93(2):1165–1191. 10.1111/brv.12390. [DOI] [PubMed] [Google Scholar]
  11. Bujnicki JM. Phylogeny of the restriction endonuclease-like superfamily inferred from comparison of protein structures. J Mol Evol. 2000:50(1):39–44. 10.1007/s002399910005. [DOI] [PubMed] [Google Scholar]
  12. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. Trimal: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009:25(15):1972–1973. 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cerón-Romero MA, Fonseca MM, de Oliveira Martins L, Posada D, Katz LA. Phylogenomic analyses of 2,786 genes in 158 lineages support a root of the eukaryotic tree of life between opisthokonts and all other lineages. Genome Biol Evol. 2022:14(8):evac119. 10.1093/gbe/evac119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Cerón-Romero MA, Maurer-Alcalá XX, Grattepanche J-D, Yan Y, Fonseca MM, Katz LA.. Phylotol: a taxon/gene-rich phylogenomic pipeline to explore genome evolution of diverse eukaryotes. Mol Biol Evol. 2019:36(8):1831–1842. 10.1093/molbev/msz103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Coleman GA, Davín AA, Mahendrarajah TA, Szánthó LL, Spang A, Hugenholtz P, Szöllősi GJ, Williams TA.. A rooted phylogeny resolves early bacterial evolution. Science. 2021:372(6542):eabe0511. 10.1126/science.abe0511. [DOI] [PubMed] [Google Scholar]
  16. Corel E, Lopez P, Méheust R, Bapteste E. Network-Thinking: graphs to analyze microbial complexity and evolution. Trends Microbiol. 2016:24(3):224–237. 10.1016/j.tim.2015.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Costa FP, Schrago CG, Mello B. Assessing the relative performance of fast molecular dating methods for phylogenomic data. BMC Genomics. 2022:23(1):798. 10.1186/s12864-022-09030-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Craig JM, Kumar S, Hedges SB. The origin of eukaryotes and rise in complexity were synchronous with the rise in oxygen. Front Bioinform. 2023:3:1233281. 10.3389/fbinf.2023.1233281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Crotty SM, Minh BQ, Bean NG, Holland BR, Tuke J, Jermiin LS, Von Haeseler A.. GHOST: recovering historical signal from heterotachously evolved sequence alignments. Syst Biol. 2020:69(2):249–264. 10.1093/sysbio/syz051. [DOI] [PubMed] [Google Scholar]
  20. Dagan T, Martin W. The tree of one percent. Genome Biol. 2006:7(10):118. 10.1186/gb-2006-7-10-118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Davín AA, Tannier E, Williams TA, Boussau B, Daubin V, Szöllősi BJ.. Gene transfers can date the tree of life. Nat Ecol Evol. 2018:2(5):904–909. 10.1038/s41559-018-0525-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Degnan JH, Rosenberg NA. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol. 2009:24(6):332–340. 10.1016/j.tree.2009.01.009. [DOI] [PubMed] [Google Scholar]
  23. Degnan JH, Salter LA. Gene tree distributions under the coalescent process. Evolution. 2005:59:24–37. 10.1111/j.0014-3820.2005.tb00891.x. [DOI] [PubMed] [Google Scholar]
  24. Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005:6(5):361–375. 10.1038/nrg1603. [DOI] [PubMed] [Google Scholar]
  25. Doolittle WF. Phylogenetic classification and the universal tree. Science. 1999:284(5423):2124–2128. 10.1126/science.284.5423.2124. [DOI] [PubMed] [Google Scholar]
  26. dos Reis M, Donoghue PCJ, Yang Z. Bayesian molecular clock dating of species divergences in the genomics era. Nat Rev Genet. 2016:17(2):71–80. 10.1038/nrg.2015.8. [DOI] [PubMed] [Google Scholar]
  27. Edwards SV. Is a new and general theory of molecular systematics emerging? Evolution. 2009:63(1):1–19. 10.1111/j.1558-5646.2008.00549.x. [DOI] [PubMed] [Google Scholar]
  28. Eme L, Tamarit D. Microbial diversity and open questions about the deep tree of life. Genome Biol Evol. 2024:16(4):evae053. 10.1093/gbe/evae053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Eyre-Walker A, Katz LA. Editorial 2024. Genome Biol Evol. 2024:16(2):evae012. 10.1093/gbe/evae012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Felsenstein J. Cases in which parsimony or compatibility methods will be positively misleading. Syst Biol. 1978:27(4):401–410. 10.1093/sysbio/27.4.401. [DOI] [Google Scholar]
  31. Flouri T, Jiao X, Rannala B, Yang Z. A Bayesian implementation of the multispecies coalescent model with introgression for phylogenomic analysis. Mol Biol Evol. 2020:37(4):1211–1223. 10.1093/molbev/msz296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Galtier N. Maximum-Likelihood phylogenetic analysis under a covarion-like model. Mol Biol Evol. 2001:18(5):866–873. 10.1093/oxfordjournals.molbev.a003868. [DOI] [PubMed] [Google Scholar]
  33. Gogarten JP, Kibak H, Dittrich P, Taiz L, Bowman EJ, Bowman BJ, Manolson MF, Poole RJ, Date T, Oshima T, et al. Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. Proc Natl Acad Sci U S A. 1989:86(17):6661–6665. 10.1073/pnas.86.17.6661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Goloboff PA, Pittman M, Pol D, Xu X. Morphological data sets fit a common mechanism much more poorly than DNA sequences and call into question the mkv model. Syst Biol. 2019:68(3):494–504. 10.1093/sysbio/syy077. [DOI] [PubMed] [Google Scholar]
  35. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012:40(D1):D1178–D1186. 10.1093/nar/gkr944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Grealey J, Lannelongue L, Saw W-Y, Marten J, Méric G, Ruiz-Carmona S, Inouye M.. The carbon footprint of bioinformatics. Mol Biol Evol. 2022:39(3):msac034. 10.1093/molbev/msac034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Harrison PW, Amode MR, Austine-Orimoloye O, Azov AG, Barba M, Barnes I, Becker A, Bennett R, Berry A, Bhai J, et al. Ensembl 2024. Nucleic Acids Res. 2024:52(D1):D891–D899. 10.1093/nar/gkad1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Hedges SB, Marin J, Suleski M, Paymer M, Kumar S. Tree of life reveals clock-like speciation and diversification. Mol Biol Evol. 2015:32(4):835–845. 10.1093/molbev/msv037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Heled J, Drummond AJ. Bayesian inference of Species trees from multilocus data. Mol Biol Evol. 2010:27(3):570–580. 10.1093/molbev/msp274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Hernández-Plaza A, Szklarczyk D, Botas J, Cantalapiedra CP, Giner-Lamia J, Mende DR, Kirsch R, Rattei T, Letunic I, Jensen LJ, et al. eggNOG 6.0: enabling comparative genomics across 12 535 organisms. Nucleic Acids Res. 2023:51(D1):D389–D394. 10.1093/nar/gkac1022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Husnik F, McCutcheon JP. Functional horizontal gene transfer from bacteria to eukaryotes. Nat Rev Microbiol. 2018:16(2):67–79. 10.1038/nrmicro.2017.137. [DOI] [PubMed] [Google Scholar]
  42. Kapli P, Flouri T, Telford MJ. Systematic errors in phylogenetic trees. Curr Biol. 2021:31(2):R59–R64. 10.1016/j.cub.2020.11.043. [DOI] [PubMed] [Google Scholar]
  43. Keeling PJ. Horizontal gene transfer in eukaryotes: aligning theory with data. Nat Rev Genet. 2024:25(6):416–430. 10.1038/s41576-023-00688-5. [DOI] [PubMed] [Google Scholar]
  44. Kingman JFC. The coalescent. Stoch Process Their Appl. 1982:13(3):235–248. 10.1016/0304-4149(82)90011-4. [DOI] [Google Scholar]
  45. Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed A-W, Kontopoulos D-G, Hilgers L, et al. Integrating gene annotation with orthology inference at scale. Science. 2023:380(6643):eabn3107. 10.1126/science.abn3107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kishino H, Thorne JL, Bruno WJ. Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol Biol Evol. 2001:18(3):352–361. 10.1093/oxfordjournals.molbev.a003811. [DOI] [PubMed] [Google Scholar]
  47. Koonin EV, Dolja VV, Krupovic M. Origins and evolution of viruses of eukaryotes: the ultimate modularity. Virology. 2015:479:2–25. 10.1016/j.virol.2015.02.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Kramer AM, Thornlow B, Ye C, Maio ND, McBroome J, Hinrichs AS, Lanfear R, Turakhia Y, Corbett-Detig R.. Online phylogenetics with matOptimize produces equivalent trees and is dramatically more efficient for large SARS-CoV-2 phylogenies than de novo and Maximum-likelihood implementations. Syst Biol. 2023:72(5):1039–1051. 10.1093/sysbio/syad031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Krupovic M, Dolja VV, Koonin EV. Origin of viruses: primordial replicators recruiting capsids from hosts. Nat Rev Microbiol. 2019:17(7):449–458. 10.1038/s41579-019-0205-6. [DOI] [PubMed] [Google Scholar]
  50. Kubatko LS, Degnan JH. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst Biol. 2007:56(1):17–24. 10.1080/10635150601146041. [DOI] [PubMed] [Google Scholar]
  51. Kumar S. Embracing green computing in molecular phylogenetics. Mol Biol Evol. 2022:39(3):msac043. 10.1093/molbev/msac043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Kumar S, Suleski M, Craig JM, Kasprowicz AE, Sanderford M, Li M, Stecher G, Hedges SB.. TimeTree 5: an expanded resource for Species divergence times. Mol Biol Evol. 2022:39(8):msac174. 10.1093/molbev/msac174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Landan G, Graur D. Characterization of pairwise and multiple sequence alignment errors. Gene. 2009:441(1-2):141–147. 10.1016/j.gene.2008.05.016. [DOI] [PubMed] [Google Scholar]
  54. Lartillot N, Philippe H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol. 2004:21(6):1095–1109. 10.1093/molbev/msh112. [DOI] [PubMed] [Google Scholar]
  55. Maddison WP. Gene trees in species trees. Syst Biol. 1997:46(3):523–536. 10.1093/sysbio/46.3.523. [DOI] [Google Scholar]
  56. Mahendrarajah TA, Moody ERR, Schrempf D, Szánthó LL, Dombrowski N, Davín AA, Pisani D, Donoghue PCJ, Szöllősi GJ, Williams TA, et al. ATP synthase evolution on a cross-braced dated tree of life. Nat Commun. 2023:14(1):7456. 10.1038/s41467-023-42924-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Malik AJ, Poole AM, Allison JR. Structural phylogenetics with confidence. Mol Biol Evol. 2020:37(9):2711–2726. 10.1093/molbev/msaa100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Mello B, Schrago CG. Modeling substitution rate evolution across lineages and relaxing the molecular clock Dos Reis, M, editor. Genome Biol Evol. 2024:16(9):evae199. 10.1093/gbe/evae199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Mello B, Tao Q, Tamura K, Kumar S. Fast and accurate estimates of divergence times from big data. Mol Biol Evol. 2017:34(1):45–50. 10.1093/molbev/msw247. [DOI] [PubMed] [Google Scholar]
  60. Mirarab S, Nakhleh L, Warnow T. Multispecies coalescent: theory and applications in phylogenetics. Annu Rev Ecol Evol Syst. 2021:52(1):247–268. 10.1146/annurev-ecolsys-012121-095340. [DOI] [Google Scholar]
  61. Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T.. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics. 2014:30(17):i541–i548. 10.1093/bioinformatics/btu462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Moody ERR, Álvarez-Carretero S, Mahendrarajah TA, Clark JW, Betts HC, Dombrowski N, Szánthó LL, Boyle RA, Daines S, Chen X, et al. The nature of the last universal common ancestor and its impact on the early earth system. Nat Ecol Evol. 2024:8(9):1654–1666. 10.1038/s41559-024-02461-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Moreira D, López-García P. Ten reasons to exclude viruses from the tree of life. Nat Rev Microbiol. 2009:7(4):306–311. 10.1038/nrmicro2108. [DOI] [PubMed] [Google Scholar]
  64. Morel B, Schade P, Lutteropp S, Williams TA, Szöllősi GJ, Stamatakis A.. SpeciesRax: a tool for Maximum likelihood Species tree inference from gene family trees under duplication, transfer, and loss. Mol Biol Evol. 2022:39(2):msab365. 10.1093/molbev/msab365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Morel B, Williams TA, Stamatakis A, Szöllősi GJ. AleRax: a tool for gene and species tree co-estimation and reconciliation under a probabilistic model of gene duplication, transfer, and loss schwartz, R, editor. Bioinformatics. 2024:40(4):btae162. 10.1093/bioinformatics/btae162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. O’Malley MA, Koonin EV. How stands the tree of life a century and a half after the origin? Biol Direct. 2011:6(1):32. 10.1186/1745-6150-6-32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Pamilo P, Nei M. Relationships between gene trees and species trees. Mol Biol Evol. 1988:5(5):568–583. 10.1093/oxfordjournals.molbev.a040517. [DOI] [PubMed] [Google Scholar]
  68. Pavlopoulos GA, Baltoumas FA, Liu S, Selvitopi O, Camargo AP, Nayfach S, Azad A, Roux S, Call L, Ivanova NN, et al. Unraveling the functional dark matter through global metagenomics. Nature. 2023:622(7983):594–602. 10.1038/s41586-023-06583-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Petitjean C, Deschamps P, López-García P, Moreira D. Rooting the domain archaea by phylogenomic analysis supports the foundation of the new kingdom proteoarchaeota. Genome Biol Evol. 2014:7(1):191–204. 10.1093/gbe/evu274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Rannala B, Yang Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics. 2003:164(4):1645–1656. 10.1093/genetics/164.4.1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Rasmussen MD, Kellis M. Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res. 2012:22(4):755–765. 10.1101/gr.123901.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. dos Reis M, Yang Z.. Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times. Mol Biol Evol. 2011:28(7):2161–2172. 10.1093/molbev/msr045. [DOI] [PubMed] [Google Scholar]
  73. Sagan L. On the origin of mitosing cells. J Theor Biol. 1967:14(3):225–IN6. 10.1016/0022-5193(67)90079-3. [DOI] [PubMed] [Google Scholar]
  74. Sánchez Reyes LL, McTavish EJ, O’Meara B. DateLife: leveraging databases and analytical tools to reveal the dated tree of life. Syst Biol. 2024:73(2):470–485. 10.1093/sysbio/syae015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Shih PM, Matzke NJ. Primary endosymbiosis events date to the later proterozoic with cross-calibrated phylogenetic dating of duplicated ATPase proteins. Proc Natl Acad Sci U S A. 2013:110(30):12355–12360. 10.1073/pnas.1305813110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Solís-Lemus C, Bastide P, Ané C. PhyloNetworks: a package for phylogenetic networks. Mol Biol Evol. 2017:34(12):3292–3298. 10.1093/molbev/msx235. [DOI] [PubMed] [Google Scholar]
  77. Spang A, Mahendrarajah TA, Offre P, Stairs CW. Evolving perspective on the origin and diversification of cellular life and the virosphere. Genome Biol Evol. 2022:14(6):evac034. 10.1093/gbe/evac034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Steenwyk JL, Buida TJ, Li Y, Shen X-X, Rokas A. ClipKIT: a multiple sequence alignment trimming software for accurate phylogenomic inference. PLoS Biol. 2020:18(12):e3001007. 10.1371/journal.pbio.3001007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE.. Big data: astronomical or genomical? PLoS Biol. 2015:13(7):e1002195. 10.1371/journal.pbio.1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Suchard MA, Redelings BD. BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics. 2006:22(16):2047–2048. 10.1093/bioinformatics/btl175. [DOI] [PubMed] [Google Scholar]
  81. Szöllosi GJ, Boussau B, Abby SS, Tannier E, Daubin V. Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc Natl Acad Sci U S A. 2012:109(43):17513–17518. 10.1073/pnas.1202997109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Szöllõsi GJ, Höhna S, Williams TA, Schrempf D, Daubin V, Boussau B.. Relative time constraints improve molecular dating. Syst Biol. 2022:71(4):797–809. 10.1093/sysbio/syab084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Szöllősi GJ, Rosikiewicz W, Boussau B, Tannier E, Daubin V. Efficient exploration of the space of reconciled gene trees. Syst Biol. 2013:62(6):901–912. 10.1093/sysbio/syt054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Tajima F. Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983:105(2):437–460. 10.1093/genetics/105.2.437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C.. Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Syst Biol. 2015:64(5):778–791. 10.1093/sysbio/syv033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Tao Q, Barba-Montoya J, Huuki LA, Durnan MK, Kumar S. Relative efficiencies of simple and Complex substitution models in estimating divergence times in phylogenomics. Mol Biol Evol. 2020:37(6):1819–1831. 10.1093/molbev/msaa049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Tria FDK, Landan G, Dagan T. Phylogenetic rooting using minimal ancestor deviation. Nat Ecol Evol. 2017:1(7):0193–0193. 10.1038/s41559-017-0193. [DOI] [PubMed] [Google Scholar]
  88. Truszkowski J, Perrigo A, Broman D, Ronquist F, Antonelli A. Online tree expansion could help solve the problem of scalability in Bayesian phylogenetics. Syst Biol. 2023:72(5):1199–1206. 10.1093/sysbio/syad045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Tuffley C, Steel M. Modeling the covarion hypothesis of nucleotide substitution. Math Biosci. 1998:147(1):63–91. 10.1016/S0025-5564(97)00081-3. [DOI] [PubMed] [Google Scholar]
  90. Uchiyama I, Mihara M, Nishide H, Chiba H, Kato M. MBGD update 2018: microbial genome database based on hierarchical orthology relations covering closely related and distantly related comparisons. Nucleic Acids Res. 2019:47(D1):D382–D389. 10.1093/nar/gky1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Wang H-C, Minh BQ, Susko E, Roger AJ. Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Syst Biol. 2018:67(2):216–235. 10.1093/sysbio/syx068. [DOI] [PubMed] [Google Scholar]
  92. Wen D, Yu Y, Zhu J, Nakhleh L. Inferring phylogenetic networks using PhyloNet Posada, D, editor. Syst Biol. 2018:67(4):735–740. 10.1093/sysbio/syy015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Williams TA, Schrempf D, Szöllősi GJ, Cox CJ, Foster PG, Embley TM.. Inferring the deep past from molecular data. Genome Biol Evol. 2021:13(5):evab067. 10.1093/gbe/evab067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Williams TA, Szöllősi GJ, Spang A, Foster PG, Heaps SE, Boussau B, Ettema TJG, Embley TM.. Integrative modeling of gene and genome evolution roots the archaeal tree of life. Proc Natl Acad Sci U S A. 2017:114(23):E4602–E4611. 10.1073/pnas.1618463114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A. 1977:74(11):5088–5090. 10.1073/pnas.74.11.5088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Wolfe JM, Fournier GP. Horizontal gene transfer constrains the timing of methanogen evolution. Nat Ecol Evol. 2018:2(5):897–903. 10.1038/s41559-018-0513-7. [DOI] [PubMed] [Google Scholar]
  97. Yang Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 1994:39(3):306–314. 10.1007/BF00160154. [DOI] [PubMed] [Google Scholar]
  98. Zhang C, Scornavacca C, Molloy EK, Mirarab S. ASTRAL-pro: quartet-based species-tree inference despite paralogy. Mol Biol Evol. 2020:37(11):3292–3307. 10.1093/molbev/msaa139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Zhu Q, Mai U, Pfeiffer W, Janssen S, Asnicar F, Sanders JG, Belda-Ferre P, Al-Ghalith GA, Kopylova E, McDonald D, et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and archaea. Nat Commun. 2019:10(1):5477. 10.1038/s41467-019-13443-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No new data were generated or analyzed in support of this research.


Articles from Genome Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES