Skip to main content
Systematic Biology logoLink to Systematic Biology
. 2022 Dec 28;72(1):161–178. doi: 10.1093/sysbio/syac062

Exploring Conflicts in Whole Genome Phylogenetics: A Case Study Within Manakins (Aves: Pipridae)

Min Zhao 1, Sarah M Kurtis 2, Noor D White 3,4, Andre E Moncrieff 5, Rafael N Leite 6, Robb T Brumfield 7, Edward L Braun 8, Rebecca T Kimball 9,
Editor: Sara Ruane
PMCID: PMC10452962  PMID: 36130303

Abstract

Some phylogenetic problems remain unresolved even when large amounts of sequence data are analyzed and methods that accommodate processes such as incomplete lineage sorting are employed. In addition to investigating biological sources of phylogenetic incongruence, it is also important to reduce noise in the phylogenomic dataset by using appropriate filtering approach that addresses gene tree estimation errors. We present the results of a case study in manakins, focusing on the very difficult clade comprising the genera Antilophia and Chiroxiphia. Previous studies suggest that Antilophia is nested within Chiroxiphia, though relationships among Antilophia+Chiroxiphia species have been highly unstable. We extracted more than 11,000 loci (ultra-conserved elements and introns) from whole genomes and conducted analyses using concatenation and multispecies coalescent methods. Topologies resulting from analyses using all loci differed depending on the data type and analytical method, with 2 clades (Antilophia+Chiroxiphia and Manacus+Pipra+Machaeopterus) in the manakin tree showing incongruent results. We hypothesized that gene trees that conflicted with a long coalescent branch (e.g., the branch uniting Antilophia+Chiroxiphia) might be enriched for cases of gene tree estimation error, so we conducted analyses that either constrained those gene trees to include monophyly of Antilophia+Chiroxiphia or excluded these loci. While constraining trees reduced some incongruence, excluding the trees led to completely congruent species trees, regardless of the data type or model of sequence evolution used. We found that a suite of gene metrics (most importantly the number of informative sites and likelihood of intralocus recombination) collectively explained the loci that resulted in non-monophyly of Antilophia+Chiroxiphia. We also found evidence for introgression that may have contributed to the discordant topologies we observe in Antilophia+Chiroxiphia and led to deviations from expectations given the multispecies coalescent model. Our study highlights the importance of identifying factors that can obscure phylogenetic signal when dealing with recalcitrant phylogenetic problems, such as gene tree estimation error, incomplete lineage sorting, and reticulation events. [Birds; c-gene; data type; gene estimation error; model fit; multispecies coalescent; phylogenomics; reticulation]


The increase in sequence data has introduced the challenge that gene trees often yield conflicting topologies due to the heterogeneity of gene histories (Pamilo and Nei 1988; Maddison 1997; Degnan and Rosenberg 2006; Edwards 2009). Conflicting signals can arise from various sources, such as incomplete lineage sorting (ILS), recombination, or reticulation events like introgression, hybridization, and horizontal gene transfer (reviewed by Degnan and Rosenberg 2009). Advances in next-generation sequencing technologies have facilitated examination of some of these complex hypotheses with substantial sequence data sampled across the genome (e.g., Chen et al. 2019; Meleshko et al. 2021). However, some phylogenetic problems have remained unresolved despite the use of large amounts of unlinked sequence data, for example, the early evolution of metazoans (Philippe et al. 2009; Simion et al. 2017; Pandey and Braun 2020), the root of placental mammals (McCormack et al. 2012; Song et al. 2012), and the early divergences of Neoaves (McCormack et al. 2013; Jarvis et al. 2014; Prum et al. 2015).

In addition to deep divergences, rapid radiations can also be challenging for phylogenetic resolution as they provide little time for genetic markers to accumulate adequate substitutions to resolve phylogenetic relationships, leading to short internal branches or polytomies in the tree (Braun and Kimball 2001). When divergence time is short relative to effective population size, ILS can be especially problematic for phylogenetic inference and the most common gene tree topology can differ from the species tree in a region of tree space called the anomaly zone (Degnan and Rosenberg 2006). It has been shown that concatenation methods can be positively misleading when the species tree is in this zone (Kubatko and Degnan 2007; Roch and Steel 2015; Mendes and Hahn 2018). Multispecies coalescent (MSC) methods can in theory accommodate this problem. However, the incongruence between trees can also result from the systematic errors due to nonphylogenetic signals in the data which cannot be eliminated by the addition of more data, for example, structured noise that results from model violation of the data (Jeffroy et al. 2006; Philippe et al. 2011). Reducing noise in phylogenomic dataset is important for fully resolving difficult phylogenetic questions, but it is not always clear how nonphylogenetic signal will bias phylogenomic inference with empirical datasets.

To alleviate issues associated with potentially problematic data, various data filtering approaches have been proposed previously. These include, for example, selecting stationary genes (Collins et al. 2005) to avoid base compositional bias or selecting genes based on informative sites (e.g., Leite et al. 2021) and missing data (e.g., Hosner et al. 2016) or simply filtering by data type. Different data types can yield discordant phylogenies (e.g., Jarvis et al. 2014), for example, exons, introns, and ultra-conserved elements (UCEs) have all been used widely; however, exons can sometimes yield unstable results compared with introns (e.g., Chen et al. 2017), exhibit more content variation, and do not fit commonly used models of sequence evolution as well as the non-coding regions, such as introns and UCEs (e.g., Reddy et al. 2017). Apart from subsampling loci prior to tree estimation, there are also approaches that subsample gene trees as the input for coalescent methods to address estimation error, for example, subsampling based on gene tree topological incongruence (e.g., Arcila et al. 2017), gene tree distances (e.g., Simmons et al. 2016), and gene tree bootstrap support (e.g., Salichos and Rokas 2013). However, these various filtering approaches have not always been effective at reconciling discordance, and the best approach(es) to use, if any, are not clear. Some locus subsampling approaches can reduce the accuracy of summary methods; this has been shown for filtering loci based on missing data (Molloy and Warnow 2018) and for subsampling based on a single gene property (Mongiardino Koch 2021). Especially for datasets like the ones analyzed in this study, where the loci are relatively long and informative because they were generated by whole genome sequencing, many commonly used filtering methods (like those based on locus informativeness) may be inappropriate.

A different filtering criterion that does not rely on any individual gene property might be more appropriate when estimated gene trees are based on highly informative genomic regions: removal of gene trees that conflict with a clade that has been well supported from previous work, and where the branch leading to this clade is long in coalescent units such that gene trees characterized by deep coalescence are unlikely. In these cases, gene trees that conflict with this branch are likely enriched for estimation errors. The rationale for this filtering approach is based on spectrum of expected c-gene lengths. c-genes (coalescence genes) are genomic segments each with a single branching history (Doyle 1995, 1997). Since deep coalescence trees, by definition, have more ancient divergences than shallow coalescence trees, a c-gene associated with a deep coalescence gene tree will have experienced more recombination events that reduce the length of that c-gene. For example, for human–chimpanzee–gorilla divergence, the shallow coalescence c-genes were estimated to range from 532 to 2710 bp in length, whereas the deep coalescence c-genes ranged from 41 to 65 bp (Hobolth et al. 2007). This further leads to the expectation that c-genes associated with deep coalescence gene trees will be similar in length to c-genes for shallow coalescence gene trees when the focal branch is short (i.e., very little time for recombination), but deep coalescence c-genes will be much shorter than shallow coalescence c-genes when focal branch in the species tree is long (Hobolth et al. 2007).

The expectation that deep coalescence c-genes will be short leads us to a fundamental question: why would analyses of relatively long regions yield estimated gene trees that appear to reflect very deep coalescences? After all, we expect the majority of sites in any relatively long genomic segment to have a shallow coalescence. The simplest explanation for any apparent deep coalescence gene trees is estimation error, regardless of whether that error is stochastic (e.g., Meiklejohn et al. 2016) or systematic (e.g., Richards et al. 2018). Alternatively, the analyzed region could have been subject to intralocus recombination and therefore include 2 or more c-genes, at least one of which is a deep coalescence gene. However, intralocus recombination can itself lead to erroneous estimation of phylogeny (see Schierup and Hein 2000). Finally, it is possible that the deep coalescence gene tree is real and the region in question is an exceptionally long c-gene with that topology—though we expect these to be rare. If deep coalescence gene trees are enriched for erroneous gene trees, they should be correlated with known sources of gene tree estimation error (e.g., loci with limited phylogenetic information, deviations from base compositional stationarity, or evidence for intralocus recombination). In this case, removing those gene trees from consideration could improve estimation of the species tree.

We present the results of a case study within manakins (Aves: Pipridae), which are known for their strong sexual dimorphism, elaborate courtship displays in leks (including choreographed multi-male displays for some species), and diverse coloration patterns (Snow 1963; Sick 1967). Our primary focus was on the Antilophia and Chiroxiphia clade which comprises 7 extant species. There is consistent evidence showing that Antilophia is nested within Chiroxiphia, rendering it paraphyletic (Tello et al. 2009; Ohlson et al. 2013; Silva et al. 2018; Harvey et al. 2020; Leite et al. 2021). Although previous phylogenetic studies agreed on the monophyly of Antilophia+Chiroxiphia, relationships within this clade remain unstable across analyses. Leite et al. (2021) used ~2200 UCEs with an average length of ~640 bp but still produced conflicting topologies depending on choice of analytical method and informative site filtering scheme for this clade. Therefore, it is worthwhile to revisit this problem with an additional data type, more and longer loci, and more informative sites.

We collected genomic data for 14 manakins, and added data from 6 published genomes, for a total of 18 species (20 individuals) sampled across the family Pipridae. We extracted UCEs (average length over 2000 bp) and introns, with a total dataset comprising more than 11,000 loci—many more loci and informative sites than previous phylogenomic work on this group (e.g., Harvey et al. 2020; Leite et al. 2021). Our goals were to 1) examine whether introducing more data/data types helps resolve the incongruence observed for Antilophia+Chiroxiphia in previous studies; 2) investigate differences between data types, models of sequence evolution and analytical methods; and 3) explore the potential sources of incongruence, such as gene tree estimation error, ILS, and reticulation events. To address these questions, we first conducted phylogenomic analyses using concatenation and MSC approaches to examine whether UCEs and introns estimate the same topology and whether concatenation and MSC methods agree. We then chose a branch with long coalescent branch length in the species tree and used this branch to define an expected monophyletic clade. We used non-monophyly of this clade to identify the gene trees likely to be inaccurate, and subsampled loci and their associated gene trees. We performed tests of stationarity, homogeneity, and signals of recombination and used a logistic regression model to identify the metrics that best explained the loci that did not recover monophyly of the expected monophyletic clade. We also explored the potential biological sources of phylogenetic incongruence (ILS and reticulation) using relative frequency analysis, ABBA-BABA tests, and phylogenetic network analysis.

Materials and Methods

Taxon Sampling and Sequencing

We obtained fresh tissue samples for 14 individuals of 13 manakin species in Pipridae (Supplementary Table S1), including 2 individuals to represent the 2 morphologically distinct groups of Lepidothrix coronata: L. c. minuscula (L. velutina minuscula in Moncrieff et al. 2022) and L. c. exquisite. We extracted DNA from these 14 samples and obtained low-coverage whole genome sequencing reads with average 9× depth (see White et al. 2022 for details). Raw genome reads are available under BioProject PRJNA727529 in NCBI SRA database. We then performed quality control and trimmed raw sequence reads to eliminate adapter contamination using Trimmomatic v0.39 (Bolger et al. 2014) with default settings. We also downloaded all available GenBank genome assembly data for 6 manakin species, including L. c. coronata, and for Empidonax traillii the Willow Flycatcher which represents the closely related family Tyrannidae as the outgroup. In total, we sampled 20 individuals of 18 manakin species from 13 of 17 named genera and 5 of 7 Antilophia+Chiroxiphia species (Supplementary Table S1).

Data Processing

We searched for UCEs in the 7 NCBI genome assemblies using the UCE 5K probe sequences, which are available from the PHYLUCE documentation (Faircloth 2016) and extracted 1000 bases of flanking sequence on each side of the conserved UCE core. As Neopelma chrysocephalum is equally distant to all of our other sampled manakin taxa according to Leite et al (2021), we used it as a reference to map raw reads onto UCEs to avoid assembling some taxa on a closer reference sequence than other taxa. Since some UCEs may be nearby other UCEs, CAP3 (Huang and Madan 1999) was employed to identify and assemble overlapping UCEs into contigs for Neopelma chrysocephalum, which in total yielded 188 UCE contigs and 4495 UCE singletons. For the 188 Neopelma UCE contigs, we used BLAST+ (Camacho et al. 2009) to search against the other 6 genomes and extracted the best hits by e-value.

For our shotgun sequencing data, we aligned paired-end sequencing reads of the 14 manakin individuals to the Neopelma UCEs (4683 sequences, contigs plus singletons) using the alignment algorithm BWA-MEM (Li 2013) implemented in BWA v0.7.17. We then used SAMtools v1.10 (Li et al. 2009) to sort the SAM file and convert it to a BAM file for each individual. We kept only alignments with a MAPping Quality value of 60 or greater and removed PCR duplicates. We called sequence variants using freeBayes v1.3.2 (Garrison and Marth 2012) and used BCFtools v1.5 (Li 2011) to extract all of the genotype entries and create a consensus FASTA file for each haplotype. We generated consensus sequences from 2 haplotypes for each locus and replaced heterozygous sites with IUPAC ambiguity characters using a custom perl script. We aligned UCEs from both sources (raw sequencing reads and GenBank genome assembly) and built alignments using MAFFT v7.407 (Katoh and Standley 2013). We retained the UCEs that contained at least 18 taxa (90% of taxa) for downstream phylogenomic analyses.

For introns, we used a reference dataset of 7057 intron alignments for 48 avian species (including Manacus vitellinus) obtained from Jarvis et al (2014). In some cases, multiple introns from the same gene were included. These alignments were used as queries to search for introns from genome assemblies with Extract_seq.pl, a data extraction pipeline (https://github.com/aakanksha12/Extract_seq) that uses the program nhmmer (Wheeler and Eddy 2013), to extract the best match to the query.

For mapping paired-end sequencing reads to the introns, we applied the same pipeline as described above for UCEs, except that we used the Manacus introns as the reference. We built 7057 intron alignments using MAFFT and used a custom perl script to prune alignments from the ends until there was at least one site with 60% of the genome assembly taxa present and 60% of all taxa present to avoid missing data biases. After pruning, we retained the introns that had at least 18 taxa and were 500 bp in length for downstream phylogenomic analyses.

To evaluate genome coverage of our datasets, we BLASTed (Camacho et al. 2009) each locus against the Chiroxiphia lanceolata chromosomes (assembly bChiLan1.pri) using makeblastdb and blastn to determine their chromosomal locations.

Analyses of Concatenated Data

We performed 2 partitioned analyses in IQ-TREE v2.1.0 (Nguyen et al. 2015; Minh et al. 2020b) for each of the 3 datasets (UCEs, introns, and UCEs/introns combined): 1) considering all standard substitution models and only allowing for invariable sites and the discrete gamma model for rate heterogeneity (-m TESTMERGE; traditional models) and 2) considering all previous models as well as the FreeRate heterogeneity model (-m MFP+MERGE; expanded models). All analyses were run for 1000 ultrafast bootstrap replicates (--ufboot 1000) with all partitions sharing the same set of branch lengths but allowing for partition-specific evolutionary rates (edge-proportional, -p). A greedy strategy was implemented to search for the best-fit partition scheme and only the top 10% partition merging schemes were examined to reduce computational burden (-rcluster 10).

Gene Tree and Species Tree Estimation

We performed 2 separate gene tree estimations for every locus in IQ-TREE which considered 2 different model sets discussed above: traditional models (-m TEST), and expanded models (-m MFP). All analyses were run for 1000 ultrafast bootstrap replicates (--ufboot 1000) with zero length branches collapsed (--polytomy). We extracted the model fit for every gene tree from the iqtree files to compare the model fit between traditional and FreeRate models.

We then estimated species trees under the MSC model in ASTRAL 5.7.4 (Mirarab et al. 2014) using the gene trees estimated with traditional models in IQ-TREE and the gene trees estimated with expanded models respectively. We also combined UCE and intron gene trees together to estimate a species tree in ASTRAL. A species tree estimated using the site-based coalescent method SVDquartets implemented in PAUP* (Swofford 1998) was done on the UCE and intron concatenated datasets. A 50% majority-rule consensus tree was computed based on 1000 bootstrap replicates using 500,000 random quartets.

Due to the conserved core and variable flanking regions, UCEs exhibit a high degree of among-sites rate heterogeneity. Tagliacollo and Lanfear (2018) propose UCE-specific models to address within-UCE heterogeneity. However, to allow for better comparison to the introns, we wanted to use the same set of models implemented in IQ-TREE. Therefore, to account for rate heterogeneity within UCEs in our data, we also estimated gene trees using the left and right flanking regions (1000 bases on each side) in IQ-TREE and input these in ASTRAL. Since the species tree topologies of the flanking regions were identical to those of the whole UCEs for all ASTRAL analyses, we only present these results in Supplementary Data.

Topological Constraints and Data Filtering

There were a number of long branches in the ASTRAL species trees, and the 2 longest (both >1.8 coalescent units) were the stem branch uniting Antilophia and Chiroxiphia (referred as Antilophia+Chiroxiphia hereafter) and the stem branch uniting Antilophia, Chiroxiphia, Masius, and Corapipo (Fig. 1). We used non-monophyly of the Antilophia+Chiroxiphia clade to identify gene trees likely to be inaccurate, as this clade was not only united by a very long branch, but also had much more complete taxon sampling compared with the other clade. c-genes are expected to be especially short when they are discordant with a long branch in the species tree. Since our loci are relatively long, we believe that the majority of loci with gene trees that conflict with this clade are likely to result from an erroneous estimate of phylogeny rather than genuine discordance. Genes that produced topologies that conflicted with Antilophia+Chiroxiphia monophyly were identified using custom R scripts that incorporated the findMRCA function in the R package phytools (Revell 2012). We also computed gene concordance factors (Minh et al. 2020a) in IQ-TREE for UCEs and introns gene trees estimated with traditional and expanded models, respectively.

Figure 1.

Figure 1.

ASTRAL species tree of Pipridae containing 18 species, 20 individuals, based on combined gene trees of 4606 UCEs and 6895 introns estimated under the traditional models. Coalescent branch lengths greater than 1 are labeled on tree. Two clades shaded in gray showed topological discordance among analyses. Trees in boxes on right highlight the alternative topologies yielded in our study, color-coded based on shades used in Table 1. Node support values are shown in Table 1. (Note that Lepidothrix coronata minuscula was proposed to be renamed as Lepidothrix velutina minuscula in Moncrieff et al. 2022.).

After we identified the loci whose gene trees did not include a monophyletic Antilophia+Chiroxiphia clade (hereafter, referred to as “non-monophyletic loci”), we conducted 2 analyses. First, we excluded the non-monophyletic loci and conducted our 3 major analytical methods: 1) we concatenated the “monophyletic loci” and reran partitioned analysis in IQ-TREE; 2) we generated an ASTRAL species tree using only the gene trees for monophyletic loci; 3) and we reran SVDquartets using the concatenated dataset of monophyletic loci. Second, we re-estimated gene trees (using either traditional or expanded models in IQ-TREE) for these non-monophyletic loci but enforced monophyly of the Antilophia+Chiroxiphia clade, and then we combined constrained gene trees with the original gene trees of the monophyletic loci to summarize an ASTRAL species tree (constr1).

Our initial analyses with unfiltered datasets also revealed conflicting topologies within another clade (Manacus + Pipra + Machaeropterus) that received extremely high support in Leite et al. (2021) (see below). This motivated us to re-estimate all gene trees by simultaneously constraining monophyly of both clades (Antilophia+Chiroxiphia and Manacus+Pipra+Machaeropterus) and summarizing the gene trees to an ASTRAL species tree (constr2).

We used DiscoVista (Sayyari et al. 2018) to summarize all ASTRAL species trees (unfiltered datasets, monophyletic loci only, constr1 and constr2) to show whether an internal branch in Antilophia+Chiroxiphia is supported or rejected by an ASTRAL tree. DiscoVista considers a branch with quartet support above 95% as strongly supported, a branch with 90–95% support as weakly supported, a branch that is not present in the tree but becomes compatible if low support branches (below 90%) are collapsed as compatible (or weakly rejected), and a strongly rejected branch as incompatible.

We calculated average GC content, interquartile range of GC variation across taxa, number of parsimony informative sites, proportion of parsimony information sites, and proportion of missing characters (Ns and gaps) for each locus to examine the summary statistics of monophyletic and non-monophyletic loci. We tested stationarity and homogeneity using the IQ-TREE symtest (Naser-Khdour et al. 2019), which outputs the number of sequence pairs in each locus alignment that can reject the assumption of stationarity and homogeneity. We also examined intralocus recombination using 3SEQ (Lam et al. 2018), which estimates the number of sequence triplets in each alignment that exhibit evidence for recombination.

We used a generalized linear model to determine whether any of the properties that we measured for individual loci can predict whether analysis of each locus will result in a tree that is monophyletic or not monophyletic for Antilophia+Chiroxiphia. To do this, we standardized the 8 summary statistics to have a mean of zero and a standard deviation of one and then ran logistic regression with standardized variables combined using the glm function in R base package stats (R Core Team 2013). We did this separately for introns, UCEs, and all loci combined. We also performed Spearman’s correlation tests for all pairs of variables and reported the r values in Supplementary Table S2. The parsimony informative sites and proportion of informative sites for UCEs are highly correlated (r = 0.947), since the UCEs in general have similar lengths. Therefore, we only kept the number of parsimony informative sites for the glm analysis of UCEs.

We calculated the absolute Robinson–Foulds (RF) tree distances (Robinson and Foulds 1981) between each gene tree and the reference tree using ETE Python toolkit v3.0 (Huerta-Cepas et al. 2016) with the 2 tentative species tree topologies as the reference tree, respectively.

Test of the MSC Model

We pruned the UCE and intron gene trees estimated under traditional models to only include the 5 species in Antilophia+Chiroxiphia and Masius chrysopterus as outgroup. We then performed relative frequency analysis on pruned gene trees of UCEs, introns, and their monophyletic loci, respectively, in DiscoVista (Sayyari et al. 2018) to visualize the frequency of 3 possible gene tree topologies around a focal branch of their corresponding species tree. As C. lanceolata and C. pareola were well-accepted as close relatives, sometimes treated as conspecific (Snow 2020), we grouped them together to better focus on the internal branches that exhibited discordance. For an unrooted 4-taxon species tree, there are 3 possible gene tree topologies, one concordant with the species tree topology and 2 discordant (Pamilo and Nei 1988). Given the MSC model, if ILS is the only cause of gene tree incongruence, we would observe 1) a majority topology that is congruent with the species tree with a frequency of gene trees higher than 1/3 and 2) 2 minority topologies with equal frequencies (both < 1/3).

Test for Reticulation Events

To test whether historical reticulation events like introgression and hybridization contribute to the incongruence observed in Antilophia+Chiroxiphia, we performed ABBA-BABA tests (Green et al. 2010; Durand et al. 2011) on all possible combinations of C. pareola, C. lanceolata, A. galeata, C. boliviana, and C. caudata under the 2 tentative species tree topologies with the relationship ((P1, P2), P3), using Masius chrysopterus as the outgroup. D statistics were computed to test for excessive gene flow between the 2 nonsister taxa of a trio. For example, for trio C. boliviana (P1), C. caudata (P2), and A. galeata (P3), a positive D statistic indicates excessive gene flow between C. caudata (P2) and A. galeata (P3), whereas a negative D statistic indicates excessive gene flow between C. boliviana (P1) and A. galeata (P3). We first extracted single-nucleotide polymorphisms (SNPs) using SNP-sites (Page et al. 2016) for each 4-taxon set (a trio and the outgroup) from each locus alignment and wrote a custom script to randomly select one SNP per locus per set. We then combined the SNPs for UCEs, introns and all loci respectively for each set to compute D statistics in the CalcD function of the R package evobiR (Blackmon and Adams 2015) and used 1000 bootstrap iterations to test for a significant deviation from the null hypothesis of D statistic = 0 (no gene flow). We repeated the random selection of SNPs 100 times for UCEs, introns and all loci combined.

We also estimated phylogenetic networks in PhyloNet v3.8.2 (Wen et al. 2018) which examines the strongest signals of introgression among any taxa, including unsampled ghost lineages, unlike the ABBA-BABA tests which focus on introgression for targeted relationships and can be influenced if ghost lineages are not taken into account (Tricou et al. 2022). We used “MCMC_GT” in PhyloNet to estimate phylogenetic networks which performs Bayesian inference of the posterior distribution of the networks (Wen et al. 2016). To do this, we used the pruned gene trees (pruned to Antilophia+Chiroxiphia and outgroup Masius chrysopterus) and performed separate network searches for UCEs, introns and all loci combined, each with 3 sets of analyses: 1) one reticulation maximum (with MCMC chains running for 1,100,000 generations, sampling every thousand, and a burn-in of first 100,000 generations); 2) 2 reticulations maximum (MCMC chain = 550,000, sampling frequency = 1000, burn-in = 50,000); and 3) 3 reticulations maximum (MCMC chain = 250,000, sampling frequency = 1000, burn-in = 50,000). Each analysis used 3 MCMC chains (1 cold chain and 2 hot chains; temperature list (1.0, 2.0, 3.0)), and we conducted 3 independent runs for each analysis. We chose the number of generations for each MCMC analysis to allow them to complete within the time limit of our clusters (28 days). We then summarized the 3 MCMC runs and assessed mixing of chains by examining the PSRF (potential scale reduction factor) values for posterior, likelihood and prior based on the tutorial in https://wiki.rice.edu/confluence/display/PHYLONET/MCMC_GT (PSRF approaching 1.0 indicates good mixing). Only the top 3 most probable networks in the 95% credible set were considered for each dataset. Networks were visualized using Dendroscope (Huson and Scornavacca 2012).

Results

Sequencing Data

We obtained an average of 29,624,754 sequence reads per taxon after trimming for the 14 samples. The UCE dataset contained ~10 Mb of data comprising 4606 UCEs with 94.6% data coverage and the intron dataset included ~10.94 Mb data, 6895 loci, with 98.1% coverage. Both UCEs and introns were sampled across the entire genome, covering almost all chromosomes, including the avian sex chromosome, Z (Supplementary Table S3).

Initial Phylogenomic Analyses

Our results overall showed congruent phylogenetic relationships with high support among most of the genera. These relationships were constant across analyses using different datasets, model selections, and choices of method (Supplementary Figs. S1–S17) and corroborated the topologies presented in Leite et al. (2021). The exception were relationships within 2 clades, where relationships differed among analyses (Fig. 1).

For the Antilophia+Chiroxiphia clade, which exhibited discordant relationships in Leite et al. (2021), we found 3 different topologies based on our initial phylogenomic analyses of the complete dataset: T1, T2, and T3 (Fig. 1 and Table 1). The intron dataset introduced topology T3 which had not been reported previously. ASTRAL species trees supported either T1 or T3. Using ML, the UCE concatenated tree shifted from T2 to T1 when models switched from traditional to expanded models, though the intron dataset estimated T3 with both models. The SVDquartets trees, for both UCEs and introns, supported T1 (Supplementary Fig. S1). Thus, relationships within this group varied depending on dataset, model, and type of analysis.

Table 1.

A summary of phylogenomic analyses

graphic file with name syac062_fig7.jpg

Unexpectedly, we also observed incongruence among analyses for the clade containing Manacus vitellinus, Pipra filicauda and Machaeropterus pyrocephalus (Fig. 1). The intron dataset in general supported Ta (Manacus, (Pipra, Machaeropterus)) except for the SVDquartets tree, while UCEs supported Tb (Machaeropterus, (Manacus, Pipra)), although Leite et al. (2021) strongly supported Ta and the Ta topology was consistent with other recent studies (e.g., McKay et al. 2010; Ohlson et al. 2013; Harvey et al. 2020).

Another incongruence was found within the L. coronata species complex. The divergence among the 3 subspecies of L. coronata showed deep divergences, similar to differences among other Lepidothrix species. While concatenation and ASTRAL trees of both UCEs and introns all consistently supported a sister relationship between the 2 Peruvian samples (L. c. exquisita and L. c. coronata) with high support, SVDquartets trees suggested L. c. exquisita sister to the Panamanian sample, L. c. minuscula (Supplementary Fig. S1).

Topological Constraints and Trees Filtering

Based on the concordance factors, we noted ~18% of UCE gene trees (traditional models: 18.3%; expanded models: 18.84%) and ~29% of intron gene trees (traditional: 28.98%; expanded: 28.8%) did not support monophyly for the Antilophia+Chiroxiphia clade (Supplementary Table S4), even though all of our analyses and previous studies consistently identified this as a monophyletic clade supported with a relatively long branch.

We filtered out the gene trees for these non-monophyletic loci. After removal, UCEs and introns produced congruent ASTRAL trees supporting T3 and Ta, regardless of the models used (Table 1). Concatenated trees remained the same as the initial trees for Antilophia+Chiroxiphia, except for introns under the expanded models, which shifted from T3 to T1 but with reduced support. All ASTRAL and concatenated trees supported Ta for Manacus+Pipra+Machaeropterus. SVDquartets trees based on monophyletic loci remained the same, including the relationships within L. coronata.

We also constrained monophyly of gene trees, constraining monophyly just in gene trees from non-monophyletic loci for Antilophia+Chiroxiphia (constr1), and simultaneously constrained monophyly in all gene trees for both Antilophia+Chiroxiphia and Manacus+Pipra+Machaeropterus (constr2). The UCE ASTRAL trees (constr1), based on traditional or expanded models, both shifted from T1 to T3, but still supported Tb (Table 1). After constraining monophyly for both Antilophia+Chiroxiphia and Manacus+Pipra+Machaeropterus, the UCE ASTRAL tree (constr2) based on gene trees under traditional models supported T3 and Ta, whereas the ASTRAL tree under expanded models supported T1 and Ta. The intron ASTRAL trees (constr1 and constr2) all supported T3 and Ta. Overall, ASTRAL trees using gene trees under traditional and expanded models exhibited improved congruence after imposing topological constraints, but only became completely congruent across all ASTRAL analyses after removal of non-monophyletic loci.

We then summarized the ASTRAL species trees under traditional and expanded models for the 5 different splits in T1, T2 and T3 for Antilophia+Chiroxiphia (Fig. 2). In general, our ASTRAL trees showed strong support for A. galeata grouping with C. lanceolata/pareola (A, B, C; in both T1 and T3), and also showed better support for a sister relationship between C. boliviana and C. caudata (D, E; in T3) than C. caudata sister to everything else (A, B, C, D; in T1) or C. boliviana sister to everything else (A, B, C, E; in T2). All ASTRAL trees strongly rejected A. galeata sister to C. caudata (C, E; in T2).

Figure 2.

Figure 2.

A summary of all ASTRAL species trees based on gene trees estimated under traditional and expanded models (from left to right: UCEs all loci, introns all loci, UCEs & introns all loci combined, UCEs with Antilophia+Chiroxiphia constrained to be monophyletic for the non-monophyletic loci (constr1), UCEs with Antilophia+Chiroxiphia and Manacus+Pipra+Machaeropterus simultaneously constrained to be monophyletic for all loci (constr2), UCEs with only monophyletic loci, introns with one clade constrained (constr1), introns with 2 clades constrained (constr2), introns with only monophyletic loci, and UCEs & introns monophyletic loci combined).

Characteristics of Monophyletic and Non-monophyletic Loci

Compared with the loci that did not support monophyly of Antilophia+Chiroxiphia, the monophyletic UCEs and introns were on average longer and had lower interquartile GC variation, slightly higher GC content, fewer parsimony informative sites in proportion to locus length, and less missing data (Fig. 3; Supplementary Table S5. Among the introns that did not support monophyly, most of the loci (1397/2057; 68%) were in our shortest length category (500–999 bp; Supplementary Table S5). The non-monophyletic loci also had lower gene concordance factors across all nodes in the tree, compared with the monophyletic loci and all loci combined (Supplementary Fig. S18). We did not find a consistent pattern for the average number of sequence pairs that rejected stationarity or homogeneity with statistical significance (Fig. 3; Supplementary Table S6). For the test of recombination signal, we found that the non-monophyletic loci of both UCEs and introns on average exhibited higher mosaic recombination signal (i.e., more putatively recombinant triplets identified).

Figure 3.

Figure 3.

Summary statistics for monophyletic and non-monophyletic loci of UCEs and introns. The violin plots show the kernel probability density of the data at different values, and also include a black dot for the mean of the data and black lines indicating the standard deviation. For the number of significant sequence pairs that reject the assumption of stationarity (F), the number of significant sequence pairs that reject the assumption of homogeneity (G) and the number of recombinant triplets (H), zeros were removed from the plots. Metrics that showed significant effects (P < 0.05) in the logistic regression analysis for either UCEs or introns are shaded in gray.

We ran a logistic regression model on the above 8 metrics combined and found that 5 variables explain the differences between monophyletic and non-monophyletic loci with a statistical significance of p-value < 0.05 for all loci and for introns (Table 2). Four of these metrics were significant for UCEs (Table 2). In all cases, the 2 β values with the largest absolute values were the number of informative sites (which decreased the risk of non-monophyly) and number of likely recombinant sequences based on the 3SEQ analysis (which increased the risk of non-monophyly).

Table 2.

Three logistic regression models were run respectively for UCEs, introns and all loci combined, using 8 standardized summary statistics as the explanatory variables, and a binary variable (monophyletic or non-monophyletic) as the response

Variable All loci combined UCEs Introns
β ± SE P-value β ± SE P-value β ± SE P-value
Parsimony informative sites −1.481 ± 0.061 <2e-16*** −0.344 ± 0.057 1.68e-09*** −2.257 ± 0.107 <2e-16***
Proportion of informative sites 0.537 ± 0.026 <2e-16*** 0.479 ± 0.033 <2e-16***
GC content −0.04 ± 0.028 0.155 0.010 ± 0.046 0.832 −0.061 ± 0.036 0.093
Interquartile GC variation 0.164 ± 0.025 9.07e-11*** 0.266 ± 0.041 8.97e-11*** 0.073 ± 0.033 0.026*
Missing data 0.297 ± 0.028 <2e-16*** 0.303 ± 0.043 2.23e-12*** 0.306 ± 0.036 <2e-16***
Stationarity −0.016 ± 0.029 0.584 0.058 ± 0.045 0.198 −0.018 ± 0.036 0.623
Homogeneity 0.013 ± 0.024 0.587 0.003 ± 0.040 0.945 0.034 ± 0.030 0.264
Recombinant triplets 0.832 ± 0.046 <2e-16*** 0.818 ± 0.063 <2e-16*** 1.442 ± 0.103 <2e-16***

Notes: Variables that showed significant effects (p < 0.05) in the logistic regression analyses are in bold. *Indicates 0.01 ≤ p < 0.05, and *** indicates p < 0.001. Since parsimony informative sites and proportion of informative sites were highly correlated for UCEs, the proportion of informative sites was not included in that analysis.

Gene trees estimated from the non-monophyletic loci of UCEs and introns had much higher average RF distances both from the T1/Tb and T3/Ta topology (Fig. 4). For the trees based on monophyletic loci, the RF distances were on average slightly lower when the reference tree was T3/Ta than when the reference tree was T1/Tb (Supplementary Table S7).

Figure 4.

Figure 4.

Absolute Robinson–Foulds distances from each gene tree to a reference tree with topology T3/Ta (A) or T1/Tb (B).

Exploring Biological Sources of Tree Discordance

To examine consistency with the multispecies coalescent model, we looked at frequencies of the 2 minority quartet topologies for both unfiltered datasets and monophyletic loci. All 4 datasets had the same majority topology for the branch connecting A. galeata and C. lanceolata/pareola (branch 1 in Figs. 5 and 6). The majority topology (red bars) was clear in all cases, and both of the minority topologies for branch 1 had similar frequencies and frequencies <1/3 (shown as blue and teal bars). These results conform to the expectation under the MSC model. As we expected given the ASTRAL analyses (Table 1), the datasets that comprised both monophyletic and non-monophyletic loci had 2 different topologies with majorities (Fig. 5), one consistent with topology T1 for the UCEs (branch 2) and the other consistent with T3 for introns (branch 3). However, both of these datasets had a second topology supported by >1/3 of the gene trees; for UCEs, the intermediate frequency quartet (blue bar) was consistent with T3 and for introns the intermediate frequency quartet (blue bar) was consistent with T1. Thus, the frequencies of quartets favoring both T1 and T3 exceed 1/3 for both data types, the only difference was the quartet with the highest absolute frequency. These results do not conform to the expectation under the MSC model. However, focusing on the monophyletic loci (Fig. 6), UCEs and intron both had the same most frequent topology for branch 3 and, as expected based on the ASTRAL analyses (Table 1), that topology was concordant with T3 (red bars in Fig. 6). Both minority topologies for branch 3 had a frequency <1/3, indicating that removal of the non-monophyletic loci improved the fit to the multispecies coalescent model. However, a modest asymmetry in the frequencies of the minority topologies was still evident and it was in different directions for UCEs and introns (blue and teal bars).

Figure 5.

Figure 5.

Relative frequency analysis using DiscoVista for the gene trees of all loci of UCEs and introns estimated under traditional models. Red bar shows the relative frequency of the topology consistent to branch 1, branch 2, and branch 3 in their associated reference species tree topology on the left, whereas blue and teal bars show alternative topologies. The dotted lines at 0.33 indicate the 1/3 threshold for the frequency of gene trees supporting the 2 minority topologies given multispecies coalescent. For tree tip labels, A is short for A. galeata, cau for C. caudata, bol for C. boliviana, pl for C. pareola plus C. lanceolata, and out for outgroup Masius chrysopterus. Numbers labeled on tree branches in the original DiscoVista output were recoded to facilitate easy comparison between datasets.

Figure 6.

Figure 6.

Relative frequency analysis using DiscoVista for the gene trees of the monophyletic loci of UCEs and introns estimated under traditional models. Red bar shows the relative frequency of the topology consistent to branch 1 and branch 3 in the reference species tree topology on the left, whereas blue and teal bars show alternative topologies. The dotted lines at 0.33 indicate the 1/3 threshold for the frequency of gene trees supporting the 2 minority topologies given multispecies coalescent. For tree tip labels, A is short for A. galeata, cau for C. caudata, bol for C. boliviana, pl for C. pareola plus C. lanceolata, and out for outgroup Masius chrysopterus. Numbers labeled on tree branches in the original DiscoVista output were recoded to facilitate easy comparison between datasets.

We also tested for gene flow after divergence to see whether introgression contributed to the asymmetries observed above. The ABBA-BABA tests based on all loci combined consistently provide evidence for gene flow between C. boliviana and all members of the clade comprising C. lanceolata, C. pareola, A. galeata, but this was only evident when the underlying topology was T3 (Table 3). Very few other tests, including the tests that assume T1, yielded evidence of gene flow. ABBA-BABA tests based on UCEs and introns both show the same pattern as those based on all loci combined (Supplementary Table S8).

Table 3.

ABBA-BABA tests for all possible combinations of C. pareola, C. lanceolata, A. galeata, C. boliviana, and C. caudata with the relationship of ((P1,P2),P3) using Masius chrysopterus as the outgroup

Underlying topology Taxon set All loci combined
P1 P2 P3 N D stat. SD
T1 C. boliviana A. galeata C. caudata 3 0.046 0.133
C. boliviana C. lanceolata C. caudata 5 −0.102 0.006
C. boliviana C. pareola C. caudata 9 0.109 0.017
T3 C. caudata C. boliviana A. galeata 58 0.120 0.024
C. caudata C. boliviana C. lanceolata 83 0.132 0.034
C. caudata C. boliviana C. pareola 60 0.123 0.030
T1 or T3 A. galeata C. lanceolata C. boliviana 9 0.058 0.100
C. lanceolata A. galeata C. caudata 7 0.110 0.006
C. pareola A. galeata C. boliviana 8 −0.112 0.024
C. pareola A. galeata C. caudata 8 −0.096 0.085
C. pareola C. lanceolata C. boliviana 7 −0.117 0.014
C. pareola C. lanceolata C. caudata 23 −0.117 0.021
C. pareola C. lanceolata A. galeata 9 0.116 0.021

Notes: The underlying topology for the tests is indicated (T1 for 3 triplets, T3 for 3 triplets, and it is compatible with either T1 or T3 for 7 triplets). N is the number of tests with significant D statistics (P-value < 0.05) out of a total of 100 random SNP selections from all loci, D stat. is the average D statistic across N tests, and SD is the standard deviation. Positive D statistics indicate excess shared ancestry for P2 and P3 and negative D statistics indicate excess shared ancestry between P1 and P3.

For Bayesian estimation of the phylogenetic networks by PhyloNet, we presented the top 3 most probable networks within the 95% credible set for each analysis (1, 2, or 3 reticulations maximum) of the 3 datasets (all genes combined, UCEs and introns). Among these 27 estimated networks, 26 networks recovered a sister relationship between C. caudata and C. boliviana in the backbone phylogeny, and 20 of the networks recovered a topology that corresponded to T3, and none corresponded to T1 (Supplementary Figs. S19–S21). When the underlying topology was T3, three of the networks also suggested gene flow between the outgroup and C. caudata (depicted in red).

Discussion

Here, we presented a well-supported Pipridae phylogeny with genus-level taxon sampling that largely agreed with a recent phylogenomic study (Leite et al. 2021). Our results further confirmed the monophyly of Antilophia+Chiroxiphia and provided strong support for the hypothesis that Antilophia is nested inside Chiroxiphia. However, our analyses also found instability in the relationships for C. caudata and C. boliviana. This instability was not clearly driven by data type, type of analysis, or model choice. Excluding loci that were not monophyletic for the Antilophia+Chiroxiphia clade, or constraining those gene trees to be monophyletic, increased congruence for intraclade relationships in ASTRAL analyses. Multiple commonly used gene metrics were found to collectively predict the non-monophyletic loci. The data exhibited deviations from expectations given the MSC, likely due to estimation errors and gene flow after divergence.

Identifying Potentially Erroneous Gene Trees

It has long been recognized that misleading signal is nonrandomly distributed in phylogenetic datasets (e.g., Naylor and Brown 1998). This recognition has led to the practice of data filtering in the phylogenomic era (e.g., Jeffroy et al. 2006). The fundamental idea underlying data filtering in phylogenomics is the identification of genomic regions that could potentially yield misleading estimates of phylogeny. In some cases, it is possible to find evidence for fundamental model violations, like variation in base composition (Collins et al. 2005; Jeffroy et al. 2006; Katsu et al. 2009; Reddy et al. 2017). However, in the absence of clear model violations, identifying loci that are especially likely to yield inaccurate estimates of phylogeny is challenging. After all, any estimated gene tree with profound differences from the best overall estimates of the species tree could simply reflect a c-gene with an especially discordant topology. In fact, we expect ILS to result in some c-gene trees that are highly discordant with the true species tree because coalescent times are exponentially distributed (Kingman 1982; Edwards and Beerli 2000). How then can we identify cases where a gene tree is highly discordant due to error rather than ILS?

We reasoned that gene trees that conflict with clades united by a very long coalescent branch in a species tree are more likely to represent estimation errors than genuine discordance. Our argument for this is based on the expected lengths of c-genes; as we stated in the introduction, deep coalescence involving a long branch (in coalescent units) in the species tree leads to short c-genes that reflect the deep coalescence gene trees. Estimating the precise length spectrum of c-genes remains a challenging problem, but the best estimates have been obtained using a coalescent hidden Markov model (HMM) framework (Hobolth et al. 2007). Unfortunately, the coalescent HMM approach is computationally demanding and, at present, is only suitable for the analysis of very long contigs. Thus, it is not appropriate for low-coverage genome data such as we collected. When Hobolth et al. (2007) analyzed the human–chimpanzee–gorilla divergence, they estimated that the shallow coalescence c-genes in 4 different genomic regions ranged from 532 to 2710 bp in length, whereas the deep coalescence c-genes had mean lengths ranging from 41 to 65 bp. This led us to hypothesize that any estimated gene trees that conflict with a clade united by a very long branch in the species tree would be more likely to represent errors rather than genuine discordance.

In our study, we chose the stem branch uniting the Antilophia+Chiroxiphia clade, for which estimates of the coalescent branch length ranged from 1.593 to 2.046 in our ASTRAL trees based on unfiltered datasets. This is much longer than the length of the branch length uniting humans and chimpanzees (about 0.55 coalescent units; Hobolth et al. 2007). Thus, c-genes for which the true topology conflicts with monophyly of the Antilophia+Chiroxiphia clade should be even shorter (on average) than the deep coalescence trees for the human–chimpanzee–gorilla analysis and those loci in which Antilophia+Chiroxiphia are non-monophyletic should be enriched for loci that yield topological errors when they were analyzed. After all, even if there is a discordant c-gene embedded in one of our relatively long alignments the majority of the aligned sites in our loci should still have an underlying gene tree congruent with monophyly of the Antilophia+Chiroxiphia clade.

There are several complexities associated with our hypothesis that non-monophyletic loci often yield trees with errors. First, c-genes lengths are expected to be geometrically distributed, so it is possible to find some relatively long c-genes with a discordant topology. This is not a major problem because our hypothesis is that the non-monophyletic loci are enriched for loci that yield inaccurate estimates of phylogeny, not that all non-monophyletic loci yield erroneous trees. Second, the length of c-genes depends on the recombination rate as well as the coalescent branch length so the expected length spectrum of avian c-genes could be longer if typical avian recombination rates are lower than typical mammalian rates. The second issue is also unlikely to be a problem because avian recombination rates appear to be higher than mammalian recombination rates (Backström et al. 2010) and higher recombination rates will yield shorter c-genes. Consistent with our theoretical framework that suggests deep coalescence c-genes are likely to be short, we found that the number of triplet sequences with evidence of recombination based on the 3SEQ analysis was higher in the non-monophyletic loci than in the monophyletic loci (Fig. 3).

Data Filtering Resolved Discordant ASTRAL Trees

Many studies have attempted to identify reliable predictors of loci that are prone for gene tree estimation error, such as base compositional stationarity (e.g., Collins et al. 2005), missing data (e.g., Hosner et al. 2016), or information content (e.g., Meiklejohn et al. 2016). However, more recent studies (e.g., Burbrink et al. 2020; Mongiardino Koch 2021) have found that a suite of gene properties better predict the performance of genes in phylogenetic analyses. Our results agree with those studies, in that multiple characteristics best explained the differences between monophyletic and non-monophyletic loci. For our datasets, the number of parsimony informative sites was the strongest predictor for non-monophyletic introns, whereas signal of recombination was the strongest predictor for non-monophyletic UCEs. In addition, some variables (e.g., GC variation) had strong effects in one dataset but not in the other. Therefore, filtering loci by any single or small number of gene properties could be difficult. However, our results also suggest that a simple topological criterion (conflicts with a long coalescent branch in the species tree), which encompasses a suite of gene properties, might provide useful information about the quality of gene trees and provide an easy way to filter loci.

Using this approach and excluding the non-monophyletic loci from the gene tree analyses increased congruence between UCEs and introns and between different model selections in the ASTRAL species trees. The topology of the Manacus+Pipra+Machaeropterus clade (Ta vs. Tb in Fig. 1) provides another line of evidence that non-monophyletic loci yielded trees that were enriched for gene tree estimation error. There is no obvious reason why removing gene trees of non-monophyletic loci should have an impact on the Manacus+Pipra+Machaeropterus clade; after all, the identification of non-monophyletic loci did not consider the Manacus+Pipra+Machaeropterus clade. Studies with much better taxon sampling in this part of the tree (McKay et al. 2010; Ohlson et al. 2013; Harvey et al. 2020; Leite et al. 2021), found support for Ta. The simplest explanation for this result is that non-monophyletic loci were enriched for loci that yielded inaccurate gene tree topologies and contain conflicting signals for multiple groups, not just for Antilophia+Chiroxiphia.

Using topological constraints on the non-monophyletic loci also improved congruence for Antilophia+Chiroxiphia among ASTRAL analyses. However, the evidence that trees based on non-monophyletic loci may have more errors throughout their gene trees (e.g., errors that impact resolution of the Manacus+Pipra+Machaeropterus clade) suggests that constraining gene trees to include a well-expected clade may be less effective than excluding likely problematic loci. In addition, enforcing constraints may introduce errors if the unconstrained gene trees accurately reflect relationships.

The argument that non-monophyletic loci are enriched for gene tree estimation error implies that the tree recovered in all analyses after removing those loci (T3) is likely to be the best estimate of phylogeny for Antilophia+Chiroxiphia. However, T3 was not present in the set of 7 topologies for the Antilophia+Chiroxiphia clade that was found in the earlier manakin UCE study (Leite et al. 2021). This raises an important question: is T3 reasonable from a biological standpoint? Provocatively, T3 is consistent with a biogeographical study that found C. boliviana more closely related to C. caudata than to C. pareola (Batalha-Filho et al. 2013). Another recent study also found high niche similarity between C. caudata and C. boliviana when compared with all the other Antilophia+Chiroxiphia species (Villegas et al. 2021). Moreover, this close relationship between birds from the Andean Yungas rainforests (e.g., C. boliviana) and the Atlantic Forest (e.g., C. caudata) has been found in many other avian taxa (e.g., Trujillo-Arias et al. 2017, 2018; 2020; Cabanne et al. 2019).

Data Type Effects, Model Fit, and Choice of Methods

Initially, our results also seemed to reflect data type effects, as UCEs tended to support T1 and Tb, whereas introns supported T3 and Ta. Since both UCEs and introns were sampled relatively evenly across the entire genome, it is unlikely this could have reflected a linked or sex -specific inheritance pattern. Although UCEs are expected to be under strong purifying selection, they have been shown to perform more similarly to introns than to exons (Reddy et al. 2017) and have been effectively used in phylogenomic studies for various avian groups (e.g., McCormack et al. 2013; Bryson et al. 2016; Wang et al. 2017; White et al. 2017). The summary statistics of our UCEs were very similar to those of the introns in terms of base composition, GC variation and information content, though UCEs did have more loci (50.52%) that exhibited signals of recombination than did the introns (37.90%) (Supplementary Table S6). It is also possible that UCEs have more complex rate heterogeneity patterns among sites than introns due to the highly conserved cores and increasingly variable flanks, which may make UCEs harder to model. Supporting this possibility, we found that more UCEs (65%) than introns (56%) had a best-fitting model that included FreeRate (rather than gamma-distributed rates and/or invariant sites) when IQTREE was allowed to consider those models. However, using expanded models did not substantially improve gene tree estimation for our datasets, since we found only marginally more non-monophyletic introns for traditional (28.98%) than expanded models (28.8%) and we actually found slightly more non-monophyletic UCEs for expanded (18.84%) than traditional models (18.3%). The data type effects we observed could potentially be reflecting some level of poor model fit, but using more complex and parameter-rich models, like the FreeRate models, did not resolve the topological conflicts. In fact, the gene trees based on expanded models overall yielded lower quartet support in the ASTRAL analyses than those using the traditional models (Table 1).

We initially found 2 competing topologies for the Manacus+Pipra+Machaeropterus clade (Ta vs. Tb); however, Ta was supported by previous studies with much denser taxon sampling within this clade (McKay et al. 2010; Ohlson et al. 2013; Harvey et al. 2020; Leite et al. 2021). Many studies have shown that taxon sampling can have a profound impact on phylogenetic analyses (e.g., Pollock et al. 2002; Zwickl and Hillis 2002) and, despite the fact that some other studies suggested that data type and model fit may have stronger influences than taxon sampling (Braun and Kimball 2002; Reddy et al. 2017), it seems reasonable to view the Ta resolution of Manacus+Pipra+Machaeropterus in the studies with better taxon sampling to be correct. With removal of the non-monophyletic loci, conflicts between data types and between models were resolved for ASTRAL trees. Since removal of the non-monophyletic loci appeared to diminish the impact of data types or model fit for our data, we suggest that we were removing nonphylogenetic signal and gene tree estimation error that can exacerbate problems with limited taxon sampling.

We also observed topological discordance among methods. Overall, we found the concatenation analysis was sensitive to different model selections and data types. Filtering out the non-monophyletic loci from the concatenated dataset did not improve consistency with concatenation. Although SVDquartets trees had identical topologies in all analyses (T1 and Tb), the relationship shown within Lepidothrix coronata contradicts our other results as well as the results from previous studies (Cheviron et al. 2005; Reis et al. 2020; Moncrieff et al. 2022). Notably, the divergence between the Panamanian sample (L. c. minuscula; L. velutina minuscula in Moncrieff et al. 2022) and the 2 Peruvian samples (L. c. exquisita and L. c. coronata) was as deep as the divergences among other Lepidothrix species and with high support in all concatenation and ASTRAL trees. The 2 internal branches within Antilophia+Chiroxiphia are exceptionally short compared to the other branches across the tree, where in theory ILS is even more likely to occur. It has been shown in some simulation studies that summary methods, such as ASTRAL, tend to be more accurate than SVDquartets when there is high level of ILS and large numbers of sites available per locus (Chou et al. 2015; Molloy and Warnow 2018). Thus, given that short internal branches can lead to poor performance of concatenation analyses and result in convergence on the incorrect topology even with the addition of more data (Degnan and Rosenberg 2006; Kubatko and Degnan 2007), we suggest that ASTRAL may provide better estimates of the true species relationships in our study. Collectively, this would suggest that T3 may be much more likely than T1 (or T2) and in turn suggests that the traditional models better estimated intron gene trees than did the expanded models in the initial analyses with unfiltered gene trees.

Potential Biological Sources of Discordance

Gene tree discordance and introgression are found to be prevalent across the suboscine radiation (Singhal et al. 2021). In addition to potential gene tree estimation error, we also found evidence for a combination of ILS and introgression that may have contributed to the discordant topologies we observed in Antilophia+Chiroxiphia, although disentangling their effects was challenging. We identified the pattern of one majority topology with 2 co-minor topologies that fits the MSC model for only one of the internal branches, in which the majority topology supports A. galeata, C. lanceolata and C. pareola forming a clade (Figs. 5 and 6, branch 1). This is supported by both UCEs and introns. ILS is not the only source of discordance for the other internal branch (Figs. 5 and 6, branch 2 or branch 3) since the frequencies of the minority quartet topologies were asymmetric and, in the case of the analysis that included both monophyletic and non-monophyletic loci, 2 quartet topologies had frequencies in excess of 1/3 (Fig. 5). The quartet that is concordant with T3 was the clear majority when we focused on the monophyletic loci, although some asymmetry remained for 2 minority topologies. These results would be consistent with gene flow and/or gene tree estimation error in addition to ILS.

We conducted ABBA-BABA tests to examine the hypothesis that there was gene flow among members of the Antilophia+Chiroxiphia clade, with a special focus on patterns that might explain either the recovery of T1 in some analyses if the underlying species tree topology is T3 or the recovery of T3 if the underlying species tree topology is T1. When we examined the 3 rooted triplets consistent with T3 (Table 3) we found evidence for gene flow that involved C. boliviana regardless of the other taxon (C. lanceolata, C. pareola, or A. galeata) in the triplet. If there was gene flow between the C. boliviana lineage and the members of this clade (or their common ancestor) it would result in an excess of topologies that unite C. boliviana with the C. lanceolata+C. pareola+A. galeata clade to the exclusion of C. caudata. Thus, we found evidence for gene flow that could explain the recovery of T1 in some analyses despite an underlying T3 species tree topology. In contrast, there were very few (≤9%) replicates that provided evidence for gene flow when we assumed T1 was the species tree topology, and the direction of estimated gene flows was not consistent with T3 in 2 of 3 cases (Table 3). Therefore, only the gene flow patterns estimated under T3 can consistently explain the recovery of both T1 and T3. This would again suggest that T3 may be more likely to reflect the true species tree topology than T1.

The hypothesis that T3 is likely to be the true species tree is also corroborated by the observation that the average RF distance from all gene trees to the T3/Ta reference tree is shorter than the distance to T1/Tb (Supplementary Table S7). The Bayesian phylogenetic networks show additional support for this. Although most of the estimated reticulations occur at the base of the clade and do not directly address the observed asymmetry, we found a sister relationship recovered for C. caudata and C. boliviana in almost all of the networks and most of them yielded T3 (Supplementary Figs. S19–S21). Three networks also show evidence for reticulation between the C. caudata lineage and the outgroup that could explain the recovery of T1 in some analyses despite an underlying T3 species tree topology.

Implications for Manakin Taxonomy

Our results suggest 2 taxonomic revisions within manakins. First, we found deep divergence between different subspecies of Lepidothrix coronata. A recent RADcap study of Lepidothrix (Moncrieff et al. 2022) with a larger number of individuals found a topology identical to ours and a similarly deep divergence. Based on those results, they suggest splitting L. coronata into 2 species. Second, our study presents a strong case for a revision of the genera Chiroxiphia and Antilophia. When our results are considered in light of some other previous studies (e.g., Tello et al. 2009; Ohlson et al. 2013; Silva et al. 2018; Harvey et al. 2020; Leite et al. 2021) it is clear that Antilophia is nested within Chiroxiphia. Thus, Antilophia Reichenbach, 1850 should be subsumed into Chiroxiphia Cabanis, 1847 based on priority (see Bánki et al. 2021).

Conclusions

Our analyses strongly corroborated the overall structure of manakin phylogeny found in prior studies (e.g., Leite et al. 2021), and they indicated that the genus Antilophia is nested within the genus Chiroxiphia. This work also highlighted the importance of gene tree estimation error. We proposed an approach to identify erroneous gene trees that uses monophyly of a “reference clade” united by a long branch in the coalescent tree (in our case the Antilophia+Chiroxiphia clade). We hypothesized that estimated gene trees lacking such a reference clade are likely to be enriched for gene tree estimation errors. We corroborated that hypothesis using a logistic regression to show that factors correlated with estimation error in other studies (e.g., variation in GC content) increase the risk of reference clade non-monophyly. The hypothesis was further corroborated by the fact that removing the gene trees estimated from non-monophyletic loci increased the congruence between this study and other studies in another part of the tree (specifically, congruence increased for the Manacus+Pipra+Machaeropterus clade). Based on these results, we believe that the trees with non-monophyletic loci removed represent the best estimate of the true relationship for these species.

Our proposed method for the identification of potentially erroneous trees could be of general utility to the systematics community. However, we do not feel that it is appropriate to recommend a precise minimum length for the coalescent branch uniting a potential reference clade based on this study alone. It is clear that the branch uniting the reference clade should be long when measured in coalescent units and the estimate of its length should be based on a relatively large number of loci (defined using the standards of modern data collection). Obviously, a single locus (e.g., barcode data) would not be suitable because it is impossible to estimate a coalescent branch length using a single gene tree. Determining whether the approach identified a set of gene trees that are enriched for errors can be assessed using the glm strategy we employed. Looking for rearrangements elsewhere in the tree may also provide information, although that will depend on the details of the taxon sample. Future studies using our approach in a variety of taxa should yield insights into more criteria that best define a focal reference clade.

Acknowledgments

We thank the Louisiana State University Museum of Natural Science for the tissues that made this work possible. We are grateful to Iker Rivas-González for helpful discussions on c-gene lengths. We thank Erfan Sayyari and Siavash Mirarab for their help with DiscoVista configuration. We also thank Luay Nakhleh for insights regarding the use of PhyloNet. The manuscript was improved from comments by the Kimball-Braun lab at the University of Florida. This work would not have occurred without the NSF funded Manakin Genomics RCN DEB 1457541 to Bette Loiselle, Emily DuVal, Christopher Balakrishnan, Michael Braun, and W. Alice Boyle. We are also grateful to the associate editor Sara Ruane and 4 anonymous reviewers for their valuable comments.

Contributor Information

Min Zhao, Department of Biology, University of Florida, Gainesville, FL 32611, USA.

Sarah M Kurtis, Department of Biology, University of Florida, Gainesville, FL 32611, USA.

Noor D White, Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, Bethesda, MD 20892, USA; Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.

Andre E Moncrieff, Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USAand.

Rafael N Leite, Graduate Program in Ecology, National Institute of Amazonian Research, Manaus, AM, Brazil.

Robb T Brumfield, Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USAand.

Edward L Braun, Department of Biology, University of Florida, Gainesville, FL 32611, USA.

Rebecca T Kimball, Department of Biology, University of Florida, Gainesville, FL 32611, USA.

Supplementary Material

Data available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.c2fqz6191

Funding

This work was supported by the United States National Science Foundation (DEB 1655683 to R.T.K. and E.L.B.; DEB 1501796 to N.D.W.; and DEB 1146265 to R.T.B.).

Author Contributions

Conceptualization was by M.Z., R.N.L., E.L.B., and RTK. Data curation was by MZ, SMK, and NDW. Formal analyses were conducted by MZ, ELB and RTK. Funding acquisition was by RTK, ELB, NDW and RTB. Investigation was by NDW, AEM, and RTK. Methodology was by MZ, NDW, AEM, ELB and RTK. Project administration was by ELB and RTK. Resources were provided by NDW, AEM, RTB, ELB and RTK. Software was developed by MZ, SMK and ELB. Supervision was by ELB and RTK. Validation was by MZ. Visualization was by MZ. Writing - original draft was by MZ, ELB, and RTK. Writing - review and editing was by MZ, SMK, NDW, AEM, RNL, RTB, ELB, and RTK.

References

  1. Arcila  D., Ortí G., Vari R., Armbruster J.W., Stiassny M.L.J., Ko K.D., Sabaj M.H., Lundberg J., Revell L.J., Betancur-R R.  2017. Genome-wide interrogation advances resolution of recalcitrant groups in the tree of life. Nat. Ecol. Evol. 1:1–10. [DOI] [PubMed] [Google Scholar]
  2. Backström  N., Forstmeier W., Schielzeth H., Mellenius H., Nam K., Bolund E., Webster M.T., Öst T., Schneider M., Kempenaers B., Ellegren H.  2010. The recombination landscape of the zebra finch Taeniopygia guttata genome. Genome Res. 20:485–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bánki  O., Roskov Y., Vandepitte L., DeWalt R.E., Remsen D., Schalk P., Orrell T., Keping M., Miller J., Aalbu R., Adlard R., Adriaenssens E., Aedo C., Aescht E., Akkari N., Alonso-Zarazaga M.A., Alvarez B., Alvarez F., Anderson G., et al. 2021. Catalogue of Life Checklist (Version 2021-08-25). Catalog. Life. doi: 10.48580/d4sg [DOI] [Google Scholar]
  4. Batalha-Filho  H., Fjeldså J., Fabre P.-H., Miyaki C.Y.  2013. Connections between the Atlantic and the Amazonian forest avifaunas represent distinct historical events. J. Ornithol. 154:41–50. [Google Scholar]
  5. Blackmon  H., Adams R.  2015. EvobiR: tools for comparative analyses and teaching evolutionary biology. doi: 10.5281/zenodo.30938 [DOI]
  6. Bolger  A.M., Lohse M., Usadel B.  2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Braun  E.L., Kimball R.T.  2001. Polytomies, the power of phylogenetic inference, and the stochastic nature of molecular evolution: a comment on Walsh et al.(1999). Evolution 55:1261–1263. [DOI] [PubMed] [Google Scholar]
  8. Braun  E.L., Kimball R.T.  2002. Examining basal avian divergences with mitochondrial sequences: model complexity, taxon sampling, and sequence length. Syst. Biol. 51:614–625. [DOI] [PubMed] [Google Scholar]
  9. Bryson  R.W.  Jr, Faircloth B.C., Tsai W.L.E., McCormack J.E., Klicka J.  2016. Target enrichment of thousands of ultraconserved elements sheds new light on early relationships within New World sparrows (Aves: Passerellidae). Auk Ornithol. Adv 133:451–458. [Google Scholar]
  10. Burbrink  F.T., Grazziotin F.G., Pyron R.A., Cundall D., Donnellan S., Irish F., Keogh J.S., Kraus F., Murphy R.W., Noonan B., Raxworthy C.J., Ruane S., Lemmon A.R., Lemmon E.M., Zaher H.  2020. Interrogating genomic-scale data for Squamata (lizards, snakes, and amphisbaenians) shows no support for key traditional morphological relationships. Syst. Biol. 69:502–520. [DOI] [PubMed] [Google Scholar]
  11. Cabanne  G.S., Campagna L., Trujillo-Arias N., Naoki K., Gómez I., Miyaki C.Y., Santos F.R., Dantas G.P.M., Aleixo A., Claramunt S., Rocha A., Caparroz R., Lovette I.J., Tubaro P.L.  2019. Phylogeographic variation within the Buff-browed Foliage-gleaner (Aves: Furnariidae: Syndactyla rufosuperciliata) supports an Andean-Atlantic forests connection via the Cerrado. Mol. Phylogenet. Evol. 133:198–213. [DOI] [PubMed] [Google Scholar]
  12. Camacho  C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L.  2009. BLAST+: architecture and applications. BMC Bioinf. 10:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Chen  L., Qiu Q., Jiang Y., Wang K., Lin Z., Li Z., Bibi F., Yang Y., Wang J., Nie W.  2019. Large-scale ruminant genome sequencing provides insights into their evolution and distinct traits. Science 364:eaav6202. [DOI] [PubMed] [Google Scholar]
  14. Chen  M.-Y., Liang D., Zhang P.  2017. Phylogenomic resolution of the phylogeny of laurasiatherian mammals: exploring phylogenetic signals within coding and noncoding sequences. Genome Biol. Evol. 9:1998–2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cheviron  Z.A., Hackett S.J., Capparella A.P.  2005. Complex evolutionary history of a Neotropical lowland forest bird (Lepidothrix coronata) and its implications for historical hypotheses of the origin of Neotropical avian diversity. Mol. Phylogenet. Evol. 36:338–357. [DOI] [PubMed] [Google Scholar]
  16. Chou  J., Gupta A., Yaduvanshi S., Davidson R., Nute M., Mirarab S., Warnow T.  2015. A comparative study of SVDquartets and other coalescent-based species tree estimation methods. BMC Genomics 16:1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Collins  T.M., Fedrigo O., Naylor G.J.P.  2005. Choosing the best genes for the job: the case for stationary genes in genome-scale phylogenetics. Syst. Biol. 54:493–500. [DOI] [PubMed] [Google Scholar]
  18. Degnan  J.H., Rosenberg N.A.  2006. Discordance of species trees with their most likely gene trees. PLoS Genet. 2:e68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Degnan  J.H., Rosenberg N.A.  2009. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24:332–340. [DOI] [PubMed] [Google Scholar]
  20. Doyle  J.J.  1995. The irrelevance of allele tree topologies for species delimitation, and a non-topological alternative. Syst. Bot. 20:574–588. [Google Scholar]
  21. Doyle  J.J.  1997. Trees within trees: genes and species, molecules and morphology. Syst. Biol. 46:537–553. [DOI] [PubMed] [Google Scholar]
  22. Durand  E.Y., Patterson N., Reich D., Slatkin M.  2011. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28:2239–2252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Edwards  S., Beerli P.  2000. Perspective: gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies. Evolution 54:1839–1854. [DOI] [PubMed] [Google Scholar]
  24. Edwards  S.V.  2009. Is a new and general theory of molecular systematics emerging? Evol. Int. J. Org. Evol. 63:1–19. [DOI] [PubMed] [Google Scholar]
  25. Faircloth  B.C.  2016. PHYLUCE is a software package for the analysis of conserved genomic loci. Bioinformatics 32:786–788. [DOI] [PubMed] [Google Scholar]
  26. Garrison  E., Marth G.  2012. Haplotype-based variant detection from short-read sequencing. arXiv Prepr. doi: 10.48550/arXiv1207.3907. [DOI] [Google Scholar]
  27. Green  R.E., Krause J., Briggs A.W., Maricic T., Stenzel U., Kircher M., Patterson N., Li H., Zhai W., Fritz M.H.-Y.  2010. A draft sequence of the Neandertal genome. Science 328:710–722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Harvey  M.G., Bravo G.A., Claramunt S., Cuervo A.M., Derryberry G.E., Battilana J., Seeholzer G.F., McKay J.S., O’Meara B.C., Faircloth B.C.  2020. The evolution of a tropical biodiversity hotspot. Science 370:1343–1348. [DOI] [PubMed] [Google Scholar]
  29. Hobolth  A., Christensen O.F., Mailund T., Schierup M.H.  2007. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3:e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Hosner  P.A., Faircloth B.C., Glenn T.C., Braun E.L., Kimball R.T.  2016. Avoiding missing data biases in phylogenomic inference: an empirical study in the landfowl (Aves: Galliformes). Mol. Biol. Evol. 33:1110–1125. [DOI] [PubMed] [Google Scholar]
  31. Huang  X., Madan A.  1999. CAP3: A DNA sequence assembly program. Genome Res. 9:868–877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Huerta-Cepas  J., Serra F., Bork P.  2016. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33:1635–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Huson  D.H., Scornavacca C.  2012. Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. Syst. Biol. 61:1061–1067. [DOI] [PubMed] [Google Scholar]
  34. Jarvis  E.D., Mirarab S., Aberer A.J., Li B., Houde P., Li C., Ho S.Y.W., Faircloth B.C., Nabholz B., Howard J.T.  2014. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346:1320–1331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Jeffroy  O., Brinkmann H., Delsuc F., Philippe H.  2006. Phylogenomics: the beginning of incongruence? Trends Genet. 22:225–231. [DOI] [PubMed] [Google Scholar]
  36. Katoh  K., Standley D.M.  2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30:772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Katsu  Y., Braun E.L., Guillette L.J. Jr, Iguchi T.  2009. From reptilian phylogenomics to reptilian genomes: analyses of c-Jun and DJ-1 proto-oncogenes. Cytogenet Genome Res. 127:79–93. [DOI] [PubMed] [Google Scholar]
  38. Kingman  J.F.C.  1982. The coalescent. Stoch. Process. Appl. 13:235–248. [Google Scholar]
  39. Kubatko  L.S., Degnan J.H.  2007. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst. Biol. 56:17–24. [DOI] [PubMed] [Google Scholar]
  40. Lam  H.M., Ratmann O., Boni M.F.  2018. Improved algorithmic complexity for the 3SEQ recombination detection algorithm. Mol. Biol. Evol. 35:247–251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Leite  R.N., Kimball R.T., Braun E.L., Derryberry E.P., Hosner P.A., Derryberry G.E., Anciaes M., McKay J.S., Aleixo A., Ribas C.C., Brumfield R.T., Cracraft J.  2021. Phylogenomics of manakins (Aves: Pipridae) using alternative locus filtering strategies based on informativeness. Mol. Phylogenet. Evol. 155:107013. [DOI] [PubMed] [Google Scholar]
  42. Li  H.  2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27:2987–2993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Li  H.  2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv Prepr. doi: 10.48550/arXiv.1303.3997. [DOI] [Google Scholar]
  44. Li  H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R.  2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Maddison  W.P.  1997. Gene trees in species trees. Syst. Biol. 46:523–536. [Google Scholar]
  46. McCormack  J.E., Faircloth B.C., Crawford N.G., Gowaty P.A., Brumfield R.T., Glenn T.C.  2012. Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species-tree analysis. Genome Res. 22:746–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. McCormack  J.E., Harvey M.G., Faircloth B.C., Crawford N.G., Glenn T.C., Brumfield R.T.  2013. A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PLoS One 8:e54848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. McKay  B.D., Barker F.K., Mays H.L. Jr, Doucet S.M., Hill G.E.  2010. A molecular phylogenetic hypothesis for the manakins (Aves: Pipridae). Mol. Phylogenet. Evol. 55:733–737. [DOI] [PubMed] [Google Scholar]
  49. Meiklejohn  K.A., Faircloth B.C., Glenn T.C., Kimball R.T., Braun E.L.  2016. Analysis of a rapid evolutionary radiation using ultraconserved elements: evidence for a bias in some multispecies coalescent methods. Syst. Biol. 65:612–627. [DOI] [PubMed] [Google Scholar]
  50. Meleshko  O., Martin M.D., Korneliussen T.S., Schröck C., Lamkowski P., Schmutz J., Healey A., Piatkowski B.T., Shaw A.J., Weston D.J.  2021. Extensive genome-wide phylogenetic discordance is due to incomplete lineage sorting and not ongoing introgression in a rapidly radiated bryophyte genus. Mol. Biol. Evol. 38:2750–2766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Mendes  F.K., Hahn M.W.  2018. Why concatenation fails near the anomaly zone. Syst. Biol. 67:158–169. [DOI] [PubMed] [Google Scholar]
  52. Minh  B.Q., Hahn M.W., Lanfear R.  2020a. New methods to calculate concordance factors for phylogenomic datasets. Mol. Biol. Evol. 37:2727–2733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Minh  B.Q., Schmidt H.A., Chernomor O., Schrempf D., Woodhams M.D., Von Haeseler A., Lanfear R.  2020b. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37:1530–1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Mirarab  S., Reaz R., Bayzid M.S., Zimmermann T., Swenson M.S., Warnow T.  2014. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30:i541–i548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Molloy  E.K., Warnow T.  2018. To include or not to include: the impact of gene filtering on species tree estimation methods. Syst. Biol. 67:285–303. [DOI] [PubMed] [Google Scholar]
  56. Moncrieff  A.E., Faircloth B.C., Brumfield R.T.  2022. Systematics of Lepidothrix manakins (Aves: Passeriformes: Pipridae) using RADcap markers. Mol. Phylogenet. Evol. 17:107525. [DOI] [PubMed] [Google Scholar]
  57. Mongiardino Koch  N.  2021. Phylogenomic subsampling and the search for phylogenetically reliable loci. Mol. Biol. Evol. 38:4025–4038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Naser-Khdour  S., Minh B.Q., Zhang W., Stone E.A., Lanfear R.  2019. The prevalence and impact of model violations in phylogenetic analysis. Genome Biol. Evol. 11:3341–3352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Naylor  G.J.P., Brown W.M.  1998. Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst. Biol. 47:61–76. [DOI] [PubMed] [Google Scholar]
  60. Nguyen  L.-T., Schmidt H.A., Von Haeseler A., Minh B.Q.  2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32:268–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Ohlson  J.I., Fjeldså J., Ericson P.G.P.  2013. Molecular phylogeny of the manakins (Aves: Passeriformes: Pipridae), with a new classification and the description of a new genus. Mol. Phylogenet. Evol. 69:796–804. [DOI] [PubMed] [Google Scholar]
  62. Page  A.J., Taylor B., Delaney A.J., Soares J., Seemann T., Keane J.A., Harris S.R.  2016. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb. Genomics 2:e000056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Pamilo  P., Nei M.  1988. Relationships between gene trees and species trees. Mol. Biol. Evol. 5:568–583. [DOI] [PubMed] [Google Scholar]
  64. Pandey  A., Braun E.L.  2020. Phylogenetic analyses of sites in different protein structural environments result in distinct placements of the metazoan root. Biology (Basel) 9:64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Philippe  H., Brinkmann H., Lavrov D.V., Littlewood D.T.J., Manuel M., Wörheide G., Baurain D.  2011. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 9:e1000602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Philippe  H., Derelle R., Lopez P., Pick K., Borchiellini C., Boury-Esnault N., Vacelet J., Renard E., Houliston E., Quéinnec E., Da Silva C., Wincker P., Le Guyader H., Leys S., Jackson D.J., Schreiber F., Erpenbeck D., Morgenstern B., Wörheide G., Manuel M.  2009. Phylogenomics revives traditional views on deep animal relationships. Curr. Biol. 19:706–712. [DOI] [PubMed] [Google Scholar]
  67. Pollock  D.D., Zwickl D.J., McGuire J.A., Hillis D.M.  2002. Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol. 51:664–671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Prum  R.O., Berv J.S., Dornburg A., Field D.J., Townsend J.P., Lemmon E.M., Lemmon A.R.  2015. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature 526:569–573. [DOI] [PubMed] [Google Scholar]
  69. R Core Team.  2013. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing. [Google Scholar]
  70. Reddy  S., Kimball R.T., Pandey A., Hosner P.A., Braun M.J., Hackett S.J., Han K.-L., Harshman J., Huddleston C.J., Kingston S., Marks B.D., Miglia K.J., Moore W.S., Sheldon F.H., Witt C.C., Yuri T., Braun E.L.  2017. Why do phylogenomic data sets yield conflicting trees? Data type influences the avian tree of life more than taxon sampling. Syst. Biol. 66:857–879. [DOI] [PubMed] [Google Scholar]
  71. Reis  C.A., Dias C., Araripe J., Aleixo A., Anciães M., Sampaio I., Schneider H., do Rego P.S.  2020. Multilocus data of a manakin species reveal cryptic diversification moulded by vicariance. Zool. Scr. 49:129–144. [Google Scholar]
  72. Revell  L.J.  2012. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3:217–223. [Google Scholar]
  73. Richards  E.J., Brown J.M., Barley A.J., Chong R.A., Thomson R.C.  2018. Variation across mitochondrial gene trees provides evidence for systematic error: how much gene tree variation is biological? Syst. Biol. 67:847–860. [DOI] [PubMed] [Google Scholar]
  74. Robinson  D.F., Foulds L.R.  1981. Comparison of phylogenetic trees. Math. Biosci. 53:131–147. [Google Scholar]
  75. Roch  S., Steel M.  2015. Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor. Popul. Biol. 100:56–62. [DOI] [PubMed] [Google Scholar]
  76. Salichos  L., Rokas A.  2013. Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497:327–331. [DOI] [PubMed] [Google Scholar]
  77. Sayyari  E., Whitfield J.B., Mirarab S.  2018. DiscoVista: Interpretable visualizations of gene tree discordance. Mol. Phylogenet. Evol. 122:110–115. [DOI] [PubMed] [Google Scholar]
  78. Schierup  M.H., Hein J.  2000. Consequences of recombination on traditional phylogenetic analysis. Genetics 156:879–891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Sick  H.  1967. Courtship behavior in manakins (Pipridae): a review. Living Bird 6:5–22. [Google Scholar]
  80. Silva  S.M., Agne C.E., Aleixo A., Bonatto S.L.  2018. Phylogeny and systematics of Chiroxiphia and Antilophia manakins (Aves, Pipridae). Mol. Phylogenet. Evol. 127:706–711. [DOI] [PubMed] [Google Scholar]
  81. Simion  P., Philippe H., Baurain D., Jager M., Richter D.J., Di Franco A., Roure B., Satoh N., Quéinnec E., Ereskovsky A.  2017. A large and consistent phylogenomic dataset supports sponges as the sister group to all other animals. Curr. Biol. 27:958–967. [DOI] [PubMed] [Google Scholar]
  82. Simmons  M.P., Sloan D.B., Gatesy J.  2016. The effects of subsampling gene trees on coalescent methods applied to ancient divergences. Mol. Phylogenet. Evol. 97:76–89. [DOI] [PubMed] [Google Scholar]
  83. Singhal  S., Derryberry G.E., Bravo G.A., Derryberry E.P., Brumfield R.T., Harvey M.G.  2021. The dynamics of introgression across an avian radiation. Evol. Lett. 5:568–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Snow  D.  1963. The evolution of manakin displays. Proceedings of the XIII International Ornithological Congress. Ithaca, NY. p. 553–561.
  85. Snow  D.  2020. Lance-tailed manakin (Chiroxiphia lanceolata), version 1.0. In: del Hoyo J., Elliott A., Sargatal J., Christie D. A., and de Juana E., editors. Birds of the world. Ithaca (NY): Cornell Lab of Ornithology. doi: 10.2173/bow.latman1.01. [DOI] [Google Scholar]
  86. Song  S., Liu L., Edwards S.V., Wu S.  2012. Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc. Natl. Acad. Sci. USA 109:14942–14947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Swofford  D.L.  1998. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sunderland (MA): Sinauer Associates,. [Google Scholar]
  88. Tagliacollo  V.A., Lanfear R.  2018. Estimating improved partitioning schemes for ultraconserved elements. Mol. Biol. Evol. 35:1798–1811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Tello  J.G., Moyle R.G., Marchese D.J., Cracraft J.  2009. Phylogeny and phylogenetic classification of the tyrant flycatchers, cotingas, manakins, and their allies (Aves: Tyrannides). Cladistics 25:429–467. [DOI] [PubMed] [Google Scholar]
  90. Tricou  T., Tannier E., de Vienne D.M.  2022. Ghost lineages highly influence the interpretation of introgression tests. Syst. Biol. 71:1147–1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Trujillo-Arias  N., Calderón L., Santos F.R., Miyaki C.Y., Aleixo A., Witt C.C., Tubaro P.L., Cabanne G.S.  2018. Forest corridors between the central Andes and the southern Atlantic Forest enabled dispersal and peripatric diversification without niche divergence in a passerine. Mol. Phylogenet. Evol. 128:221–232. [DOI] [PubMed] [Google Scholar]
  92. Trujillo-Arias  N., Dantas G.P.M., Arbeláez-Cortés E., Naoki K., Gómez M.I., Santos F.R., Miyaki C.Y., Aleixo A., Tubaro P.L., Cabanne G.S.  2017. The niche and phylogeography of a passerine reveal the history of biological diversification between the Andean and the Atlantic forests. Mol. Phylogenet. Evol. 112:107–121. [DOI] [PubMed] [Google Scholar]
  93. Trujillo-Arias  N., Rodríguez-Cajarville M.J., Sari E., Miyaki C.Y., Santos F.R., Witt C.C., Barreira A.S., Gómez I., Naoki K., Tubaro P.L., Cabanne G.S.  2020. Evolution between forest macrorefugia is linked to discordance between genetic and morphological variation in Neotropical passerines. Mol. Phylogenet. Evol. 149:106849. [DOI] [PubMed] [Google Scholar]
  94. Villegas  M., Loiselle B.A., Kimball R.T., Blake J.G.  2021. Ecological niche differentiation in Chiroxiphia and Antilophia manakins (Aves: Pipridae). PLoS One 16:e0243760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Wang  N., Hosner P.A., Liang B., Braun E.L., Kimball R.T.  2017. Historical relationships of three enigmatic phasianid genera (Aves: Galliformes) inferred using phylogenomic and mitogenomic data. Mol. Phylogenet. Evol. 109:217–225. [DOI] [PubMed] [Google Scholar]
  96. Wen  D., Yu Y., Nakhleh L.  2016. Bayesian inference of reticulate phylogenies under the multispecies network coalescent. PLoS Genet. 12:e1006006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Wen  D., Yu Y., Zhu J., Nakhleh L.  2018. Inferring phylogenetic networks using PhyloNet. Syst. Biol. 67:735–740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Wheeler  T.J., Eddy S.R.  2013. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29:2487–2489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. White  N.D., Batz Z.A., Braun E.L., Braun M.J., Carleton K.L., Kimball R.T., Swaroop A.  2022. A novel exome probe set captures phototransduction genes across birds (Aves) enabling efficient analysis of vision evolution. Mol. Ecol. Resour. 22:587–601. [DOI] [PubMed] [Google Scholar]
  100. White  N.D., Mitter C., Braun M.J.  2017. Ultraconserved elements resolve the phylogeny of potoos (Aves: Nyctibiidae). J. Avian Biol. 48:872–880. [Google Scholar]
  101. Zwickl  D.J., Hillis D.M.  2002. Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51:588–598. [DOI] [PubMed] [Google Scholar]

Articles from Systematic Biology are provided here courtesy of Oxford University Press

RESOURCES