Skip to main content
. 2019 Feb 14;7:e6399. doi: 10.7717/peerj.6399

Table 2. Challenges in the fields of coalescent-based phylogenomics and implications for unraveling character evolution and the Tree of Life.

Category Challenge Proposed strategy
Data Integration and assessment of large amounts of data with heterogeneous phylogenetic signal. Protocols for marker selection should assess markers’ biological relevance and adequacy for the study organism, given the temporal and spatial scales in question, and not only logistical convenience. A posteriori (after data generation) marker selection from whole-genome alignments can be useful to inform these aspects as well as minimize the effects of missing data and varying data quality. Until then, researchers should attempt a higher standardization of markers to facilitate combinatory analyses.
To discern true phylogenomic heterogeneity from noise and error as well as to identify violations of the MSC, adequate filtering of large phylogenomic datasets should be conducted based on biological and statistical properties of markers (e.g., analyses of gene-tree outliers and rogue taxa).
Further research on filtering methods as well as on their impact on phylogenomic estimation is still required.
Inclusion of additional character types into phylogenomic analyses. Research efforts focused on the adequate identification and utilization of rare genomic changes other than nucleotide substitutions, such as indels, transpositions, inversions, CNVs, and chromosomal rearrangements. Development of new methods not only to infer phylogenetic hypotheses based on these characters but also to integrate them with more traditional sequence data.
Phylogenetic inference models and methods Analyses of genome-scale data for large numbers of species within a coalescent framework. Continue the development of models and methods that allow simultaneous gene tree and species tree estimation within a Bayesian framework (e.g., Ogilvie, Bouckaert & Drummond, 2017) for increasingly large and complex datasets.
For the time being, two-step methods, particularly those based on biological models and permitting statistical tests of topologies in a likelihood framework (e.g., Liu, Yu & Edwards, 2010), are useful tools to incorporate coalescent information into species tree inference.
Detection and incorporation of violations of the MSC into phylogenomic inferences. Extensions of the MSC should seek the inference of reticulate evolutionary histories (i.e., multispecies network coalescent; Yu et al., 2014; Wen et al., 2016) by simultaneously incorporating violations of the MSC (e.g., Wen & Nakhleh, 2018 (reticulation and ILS); Jones, Sagitov & Oxelman, 2013 (allopolyploidy)). Inference methods dealing with GDL and recombination are of high priority.
Further development of conceptual approaches aimed at detecting and quantifying different underlying biological processes of phylogenetic history (e.g., Jones, 2018 (ILS and migration), Blischak et al., 2018 (hybridization), Hibbins & Hahn, 2018 (direction and timing of introgression), Sousa et al., 2017 (ILS and GDL), Li et al., 2018 (recombination rates)). As proposed by Degnan (2018), using the MSC as a null model within a model selection approach can be a powerful tool to identify violations of the MSC and to deepen our understanding of the biological consequences of these processes.
Models that integrate phylogenomics and comparative analyses Integrating different phylogenetic signals into comparative analyses. Methods and models should attempt to incorporate gene tree incongruence into macroevolutionary models of character evolution. Similarly, integrative studies aiming at unraveling character evolution at the molecular, phenotypic, and clade levels.
Understanding the genomic bases of character evolution in species trees vs. gene trees. Methods that estimate phenotype-genotype associations incorporating heterogeneity across gene trees or that at least take into account differential state probabilities stemming from gene tree discordance (e.g., Guerrero & Hahn, 2018). Similarly, extensions of the MSC for quantitative traits that take into account genealogical heterogeneity represent a promising avenue for research and implementation (e.g., Mendes et al., 2018).
Best practices for building the ToL Increasing the number of species represented in the ToL while ensuring reproducibility and encouraging community participation Natural history museums must be central players for providing and analyzing genome-scale data. Genetic resources and specimen collections are fundamental for allowing the acquisition of data for extinct and poorly accessible species. Open access community initiatives must continue to be relevant repositories of the ToL. Adequate methods for curation of data and analytical tools must continue to be a high priority.