Abstract
Insights into the processes underpinning convergent evolution advance our understanding of the contributions of ancestral, introgressed, and novel genetic variation to phenotypic evolution. Phylogenomic analyses characterizing genome-wide gene tree heterogeneity can provide first clues about the extent of ILS and of introgression and thereby into the potential of these processes or (in their absence) the need to invoke novel mutations to underpin convergent evolution. Here, we were interested in understanding the processes involved in convergent evolution in open-habitat chats (wheatears of the genus Oenanthe and their relatives). To this end, based on whole-genome resequencing data from 50 taxa of 44 species, we established the species tree, characterized gene tree heterogeneity, and investigated the footprints of ILS and introgression within the latter. The species tree corroborates the pattern of abundant convergent evolution, especially in wheatears. The high levels of gene tree heterogeneity in wheatears are explained by ILS alone only for 30% of internal branches. For multiple branches with high gene tree heterogeneity, D-statistics and phylogenetic networks identified footprints of introgression. Finally, long branches without extensive ILS between clades sporting similar phenotypes provide suggestive evidence for the role of novel mutations in the evolution of these phenotypes. Together, our results suggest that convergent evolution in open-habitat chats involved diverse processes and highlight that phenotypic diversification is often complex and best depicted as a network of interacting lineages.
Keywords: birds, gene tree heterogeneity, incomplete lineage sorting, introgression, mutation
Introduction
Molecular phylogenetics has unveiled many previously unknown examples of convergent evolution—here meant to refer to a phenotypic pattern in which non-sister species are phenotypically more similar to each other than to their respective sister species (following Arendt and Reznick 2008; Stern 2013). Under such an evolutionary outcome, species relationships based on morphometrics, coloration, behavior, or other ecological traits are discordant with the history of descent reflected in the species tree (Elmer and Meyer 2011; Aliabadian et al. 2012; Martin and Orgogozo 2013; Stern 2013; Jarvis et al. 2014; Schweizer et al. 2019a, 2019b; Paterson et al. 2020). While the many observations of such discordances across the tree of life witness the abundance of convergent evolution, insights into the underlying processes remain more elusive.
Phylogenetic information from genomic data now provides unprecedented power to consolidate patterns of convergent evolution and obtain insights into the underlying processes. Many examples of putative convergent evolution are yet based on phylogenies reconstructed from a restricted number of genetic markers (Cresko et al. 2004; Colosimo et al. 2005; Aliabadian et al. 2012; Stern 2013; Brusatte et al. 2015). Since phylogenetic relationships at different positions in the genome, referred to as “gene trees”, can vary substantially, many gene trees inevitably deviate from the species’ history of descent reflected in the species tree (Degnan and Rosenberg 2006; Toews and Brelsford 2012). Hence, the mismatch of single gene trees with phenotypic similarities alone does not provide conclusive evidence for convergent evolution (Doyle 1997; Degnan and Rosenberg 2006; Lamers et al. 2012). Confirming instances of convergent evolution, therefore, requires species tree reconstructions from genome-wide variation. Once the evidence for convergent evolution is corroborated by the species tree, we can move forward to investigate the processes underlying gene tree heterogeneity that may also underpin convergent evolution.
Convergent evolution can occur via three processes: First, phenotypic similarities can evolve through independent mutations in the same or different genes (“parallel evolution” sensu Stern 2013) (Arendt and Reznick 2008; Martin and Orgogozo 2013; Stern 2013). In Mexican cavefish (Astyanax mexicanus), for instance, the evolution of decolored brown phenotypes and albinism in separate caves occurred through different mutations in the MC1R and OCA2 genes (Protas et al. 2006; Gross et al. 2009; Stahl and Gross 2015). Similarly, in plants, isoforms of PEPC found in C4 photosynthesis, and similar floral traits important for pollination have evolved multiple times independently (Whittall et al. 2006; Christin et al. 2007; Hoballah et al. 2007; Besnard et al. 2009; Preston and Hileman 2009).
The second, and likely most frequent process leading to convergent evolution that also accounts for most gene tree heterogeneity is incomplete lineage sorting (ILS; Stern 2013 includes this under “collateral evolution”), that is, the retention of alleles and traits that were already present in the ancestral lineage (Cresko et al. 2004; Colosimo et al. 2005; Stern 2013; Van Belleghem et al. 2018). ILS is prevalent in radiations characterized by large effective population sizes and the fast succession of speciation events, such as in the evolution of neoavian birds (Jarvis et al. 2014; Suh et al. 2015; Suh 2016) or in the diversification of sticklebacks (Colosimo et al. 2005; Jones et al. 2012; Roberts Kingman et al. 2021). In such cases, a high proportion of ancestral variation may be retained over subsequent species splits and segregate in the independently segregating gene pools of daughter species (Maddison 1997). Selection or drift in non-sister species may fix the same genotype (and phenotype), while sister species may fix a different genotype/phenotype. For instance, in Humans 1% of the genome is genetically more similar to orangutans than to chimps due to ILS, even though these primates are characterized by small effective population sizes (Hobolth et al. 2011).
Third, in hybridizing lineages, convergent evolution and gene tree heterogeneity may be underpinned by introgression (the exchange of genetic material between species) that mingles genotypes and phenotypes among species (Stern 2013 includes this under “collateral evolution”) (Song et al. 2011; Heliconius Genome Consortium 2012; Stryjewski and Sorenson 2017; Malinsky et al. 2018). In particular, introgression between non-sister species may result in these species being phenotypically more similar than they are to their respective sister species, such as exemplified by wing-pattern mimicry in Heliconius butterflies (Pardo-Diaz et al. 2012; Edelman et al. 2019), and by plumage coloration of Munia finches and of members of the Black-eared Wheatear (Oenanthe hispanica) complex (Stryjewski and Sorenson 2017; Schweizer et al. 2019a). Importantly, in an increasing number of instances, such as in Heliconius butterflies, Yellowstone wolves, Darwin's finches, and cichlid fish, introgression has exchanged alleles between species and resulted in the formation of beneficial phenotypes (Grant et al. 2005; Genner and Turner 2012; Lamers et al. 2012; Wallbank et al. 2016; Enciso-Romero et al. 2017). Given that over the last decade genomic studies have contributed increasing evidence for the abundance of such adaptive introgression, hybridization (the interbreeding of different species) may underpin convergent evolution more often than previously appreciated (Campagna et al. 2017; Han et al. 2017; Meier et al. 2018; Marques et al. 2019a).
Multiple factors influence which of these processes were most likely involved in specific cases of convergent evolution. These factors include the evolutionary time scale under consideration, the speed at which successive speciation events occurred, effective population sizes, and the opportunity for genetic exchange according to biogeographic history. Waiting times for beneficial mutations are long (Hermisson and Pennings 2005; Barrett and Schluter 2008; Hedrick 2013). Independent mutations with the same phenotypic effect are thus usually exceedingly rare (Eyre-Walker and Keightley 2007) and only over the course of millions of years may occur in sufficient number to be a source of convergent evolution (Hedrick 2013; but see Xie et al. 2019). Therefore, at short evolutionary time scales, convergent evolution may more often involve the recruitment of standing genetic variation (Barrett and Schluter 2008), notably from the pool of ancestral variation segregating in extant species, or variation introgressed from other species (Stern 2013); especially since in young lineages ancestral variation is still segregating and because reproductive isolation may still be incomplete between young species.
Phylogenomics can provide important indirect insights into the potential contribution of ILS, introgression, and novel mutations to convergent evolution: First, the species tree provides initial clues on whether speciation events occurred over short enough time scales for ancestral variation to be passed to descent lineages and thus remain incompletely sorted in important proportions beyond speciation events. Second, insights into the extent of ILS and presence of introgression can be gained from levels of gene tree heterogeneity (Funk and Omland 2003; Degnan and Rosenberg 2006; Jarvis et al. 2014; Nater et al. 2015; Suh et al. 2015; Suh 2016) and symmetries of gene tree frequencies (Hibbins and Hahn 2022). Gene tree heterogeneity is high under both ILS and introgression, but the two processes leave different proportions of alternative gene trees, based on which they can be distinguished (Sayyari and Mirarab 2018; Sayyari et al. 2018; Hibbins and Hahn 2022). In the presence of extensive ILS or of introgression, a parsimonious approach attributes the source of convergent evolution to these processes, even though independent mutations cannot be excluded as the source of convergent evolution (Cresko et al. 2004; Colosimo et al. 2005; Pardo-Diaz et al. 2012; Stryjewski and Sorenson 2017). The absence of detecting these processes, conversely, would indirectly suggest novel mutations as a potential source of convergent evolution. Therefore, surveys of gene tree heterogeneity and symmetries of gene tree proportions represent a promising avenue to probe the potential of the alternative processes to contribute to convergent evolution.
Here, we reconstructed the species tree and assessed the contribution of ILS and introgression to gene tree heterogeneity in open-habitat chats (genera Campicoloides, Emarginata, Myrmecocichla, Oenanthe, Pinarochroa, and Thamnolaea), a monophyletic group of songbirds displaying a high incidence of convergent evolution (Mayr and Stresemann 1950; Aliabadian et al. 2012; Schweizer et al. 2019a, 2019b). The phylogenetic relationships among open-habitat chats inferred from mitochondrial data were entirely unexpected from a morphological perspective (Aliabadian et al. 2012). Species similar in plumage coloration and other traits were often spread far apart across the mitochondrial phylogeny, suggesting convergent evolution of phenotypic similarities (Outlaw et al. 2010; Aliabadian et al. 2012; Schweizer and Shirihai 2013; Schweizer et al. 2019a, 2019b). For a limited subset of species studied, genome-wide variation (ddRAD data) confirmed the mitochondrial relationships (Schweizer et al. 2019b). Furthermore, hybridization resulted in substantial introgression in the O. hispanica complex (Schweizer et al. 2019a) and is suspected to have played a role in phenotypic and species evolution in the Oenanthe picata complex (Panov 2005). In these instances, introgression between non-sister taxa may well explain convergent evolution. However, genomic data is essential to corroborate and refine the species tree and assess the incidence of ILS and/or introgression across open-habitat chats.
Based on whole-genome resequencing data from 50 taxa of 44 open-habitat species (supplementary table S1, Supplementary Material online), we aimed to obtain insights into the potential roles of alternative processes in driving convergent evolution in these songbirds. To this end, we 1) reconstructed the species tree, 2) estimated gene tree variation across the genome, and 3) explored ILS and introgression as drivers of the underlying high gene tree heterogeneity. Our results reveal a comprehensive picture of open-habitat chat evolution involving high rates of ILS and multiple instances of introgression particularly in wheatears (genus Oenanthe). Footprints of ILS and introgression as well as considerable divergence times between the main clades of wheatears with convergent evolution suggest that, most likely, a combination of ILS, introgression, and novel mutations explains the convergent evolution observed in wheatears.
Results
Sampling, Nuclear Data Preparation, and Mitogenome Assembly
To achieve an almost complete taxon sampling, we resequenced the genomes of 50 open-habitat chat taxa from 44 of 47 recognized species (fig. 1; supplementary table S1, Supplementary Material online). A Saxicola maurus genome was included as outgroup (Sangster et al. 2010; Zuccon and Ericson 2010). We mapped the sequencing reads to the reference genome assembly of Oenanthe melanoleuca (Peona et al. 2022) and followed GATK best practices for nuclear data preparation. Mapping efficiency was not correlated with the degree of divergence from the reference genome, but data obtained from DNA extracted off museum skins mapped at a lower percentage (linear model, dXY: t = −0.41, P = 0.68; tissuemuseum: t = −6.56, P < 0.001; R2 = 0.53). After mapping, sequencing coverage ranged from 4.6x to 40.6x, with an average coverage of 12.2x ± 6.2x (supplementary table S1, Supplementary Material online). We extracted mitochondrial sequence data for all 13 protein-coding genes and two rRNA genes from the resequencing data using MitoFinder 1.2 (Allio et al. 2020). To ensure that results did not depend on filtering strategy, all analyses were run with four sets of differently filtered data (see Materials and Methods).
Species Tree Reconstruction Based on Nuclear Genomic Data
We first set out to reconstruct and root the species tree based on regions of the genome least likely affected by mapping biases. To this end, we extracted data from genomic intervals hosting avian Benchmarking Universal Single-Copy Orthologs (BUSCO). This resulted in data from 7,335 BUSCO, with alignment lengths varying from 89,898 to 140,640 kb (depending on filtering strategy) for ML analyses of concatenated data, respectively, 2,091 BUSCO with alignment lengths varying from 10,575 to 15,290 kb for LD-pruned data free of inter-locus recombination for multispecies coalescent-based species tree reconstruction. Results were consistent among filtering strategies. Hence, we only report results based on the most stringent filtering of read depth (ii, minimum read depth, DP = 5; minimum percentage of the window covered by data, PW = 50%; missing data per site, MD = 15%). Both, maximum likelihood (ML) analyses in IQ-TREE 2 based on concatenated data and multispecies coalescent analyses in ASTRAL-III (based on BUSCO ML gene trees) established sub-Saharan species of the genera Campicoloides, Emarginata, Myrmecocichla, Pinarochroa, and Thamnolaea as the sister clade to all other open-habitat chats (supplementary fig. S1a, Supplementary Material online). For the subsequent analyses we excluded the Saxicola outgroup and rooted the trees on the sub-Saharan clade.
We then moved to reconstruct the species tree based on an as broad representation of the genome as possible. To this end, we extracted alignments including variant and invariant sites for non-overlapping 10 kb windows. We henceforth refer to these windowed data as “loci”. Analyses included only loci that fulfilled filtering criteria for read depth, alignment length, data missingness (see Materials and Methods), and absence of evidence for intra-locus recombination. Furthermore, we sub-sampled filtered loci to be at least 10 kb apart to ensure free inter-locus recombination. Depending on filtering strategy, this left us with 5,267–6,791 loci with total alignment lengths of 34,556–52,243 kb (supplementary table S2, Supplementary Material online). We identified branches in the “anomaly zone” (Degnan and Rosenberg 2006) in several clades of wheatears: in the hispanica and picata complexes, and in the isabellina clade (fig. 1). Nevertheless, the polytomy test based on local quartet supports in ASTRAL-III showed no evidence for polytomies in the species tree (P = 0 for all branches). The ML tree based on concatenated data and the multispecies coalescent-based species tree were fully supported and in agreement both with each other (except the position of Thamnolaea cinnamomeiventris within the sub-Saharan clade) and with the tree based on BUSCO (supplementary fig. S1b, Supplementary Material online). Finally, a SNP-based species tree estimated in SVDquartets mostly confirmed the sequence-based results (supplementary fig. S2, Supplementary Material online). The only three disagreements (position of Oenanthe leucura and Oenanthe leucopyga, position of Oenanthe bottae and Oenanthe pileata, and position of T. cinnamomeiventris) were poorly supported in the SNP-based analysis and are likely a result of high levels of ILS under which sequence-based approaches are more accurate than approaches based on SNP data alone (Chou et al. 2015).
Mitogenomic Relationships and Mito-Nuclear Discordances
We were interested in whether previously inferred relationships based predominantly on single mitochondrial genes (Aliabadian et al. 2012; Schweizer and Shirihai 2013; Alaei Kakhki et al. 2016) were supported by full mitogenomes and in inferring mito-nuclear discordances.
Mitogenomic relationships were in remarkable agreement with previously inferred phylogenetic relationships based predominantly on individual mitochondrial genes (Aliabadian et al. 2012; Schweizer and Shirihai 2013; Alaei Kakhki et al. 2016), yet showed several discordances with the species tree recovered from nuclear data (fig. 2). Mito-nuclear discordances in wheatears were found in several places across the species tree but were mostly restricted to the placements of tip taxa: 1) In the lugens complex, nuclear data placed Oenanthe lugens persica within the complex, whereas the mitogenome placed it with Oenanthe xanthoprymna and Oenanthe chrysopygia. 2) In the picata complex, Oenanthe albonigra that by mitochondrial data was considered a sister taxon to the picata complex, was placed within the latter as a sister taxon to the phenotypically almost identical Oenanthe picata picata by nuclear data. 3) In the hispanica complex, Oenanthe cypriaca was placed as sister to either O. melanoleuca or Oenanthe pleschanka in nuclear and mitogenomic data, respectively. 4) In the isabellina clade, Oenanthe heuglini clustered as sister to either Oenanthe isabellina or O. bottae by nuclear or mitogenomic data, respectively. 5) Moreover, O. leucura and O. leucopyga formed a sister clade to the Oenanthe lugubris/lugentoides clade according to the nuclear species tree, but mitogenomes placed them consecutively at the root of the clade including Oenanthe finschi and the lugens complex. To understand whether nuclear gene trees were entirely discordant with mitogenomic relationships or in part reflected the latter, for each of the above discordances we checked for nuclear gene trees that agreed with the mitogenomic tree. This showed that for most of the mito-nuclear discordances, roughly 15% of the gene trees agreed with the mitogenomic relationships (picata complex: 14.40%, 4,282 of 29,730 gene trees; hispanica complex: 13.13%, 3,905 of 29,730 gene trees; isabellina clade: 15.71%, 4,671 of 29,730 gene trees; lugens complex: 2.77%, 824 of 29,730 gene trees).
Time Trees
In addition to species’ relationships, we were interested in understanding the time scales at which species diverged. Due to the lack of appropriate fossils, we resorted to first estimating a time-calibrated mitochondrial phylogeny based on the 13 mitochondrial protein-coding genes, for which substitution rates are available (Lerner et al. 2011). The analysis in BEAST 2.6.6 showed high convergence of all parameters in three independent runs after 25% of the trees were discarded as burn-in (ESS >300). The results showed a high agreement with previous results obtained from single genes (Alaei Kakhki et al. 2016), dating the origin of open-habitat chats to the Miocene about 5.67 million years ago (Ma) [95% highest posterior density (HPD): 5.32–6.06 Ma]. The diversification of wheatears (genus Oenanthe) started about 5.09 Ma (95% HPD: 4.75–5.44 Ma) (fig. 1, supplementary fig. S3, Supplementary Material online).
We then used the diversification time of the open-habitat chats estimated from mitochondrial data as a time constraint in dating analyses based on nuclear data. For these analyses, we first provided the topology and branch lengths obtained from ML analyses of concatenated BUSCO data along with 1.8 Mb high-confidence nuclear data (see Materials and Methods) to generate the time-calibrated tree with RelTime-ML (Kumar et al. 2018). Compared with the mitochondrial results, the nuclear data mostly estimated similar divergence times between clades and shorter divergence times within clades (Pearson’s r = 0.93, P < 0.001) (fig. 1, supplementary fig. S3, Supplementary Material online). Second, we performed dating analyses for windowed loci across the genome the same way as for BUSCO by providing 3.8 Mb high-confidence data. Divergence times based on BUSCO strongly correlated with ones estimated from windowed loci (Pearson’s r = 0.99, P < 0.001) (supplementary fig. S4, Supplementary Material online). A test in which we re-ran the estimation of mitochondrial divergence times in RelTime-ML the same way as for nuclear data yielded the same divergence times as estimated in BEAST, thus confirming that differences in divergence times between mitochondrial and nuclear data are not due to the approach but reflect the different data types.
Extensive Gene Tree Heterogeneity
Having established the species tree, we aimed to quantify the levels of gene tree heterogeneity in wheatears to understand whether the processes generating gene tree heterogeneity could underly convergent evolution in this core group of open-habitat chats that displays the highest incidence of convergent evolution.
Several lines of evidence demonstrate extensive gene tree heterogeneity in wheatears. Remarkably, not a single gene tree out of 29,730 gene trees matched the species tree. Furthermore, many branches of the species tree—including ones with local posterior probability 1—showed a high number of conflicting bipartitions compared with concordant bipartitions, as evidenced by low Internode Certainty All (ICA) scores (supplementary fig. S5, Supplementary Material online), with ICA ranging from 1 to 0.35 and average ICA of 0.65 ± 0.19 (mean ± standard deviation). The high gene tree heterogeneities highlighted by ICA were further supported by low percentages of gene trees recovering the topology of the species tree at these internodes, as estimated by the gene concordance factor (gCF) (fig. 1) that ranged from 1 to 0.06 with an average of 0.52 ± 0.30 (mean ± standard deviation). ICA and gCF were highly correlated (Pearson’s r = 0.94, P < 0.001) (supplementary fig. S5, Supplementary Material online). As expected, evidence for extensive gene tree heterogeneity was highest in clades with branches classified as within the phylogenetic anomaly zone. These included the lugens, picata, and hispanica complexes, the isabellina clade, and the placement of O. leucopyga and O. leucura.
Contributions of ILS to Gene Tree Heterogeneity
Next, we aimed to understand to which extent the levels of gene tree heterogeneity observed in wheatears can be explained by ILS alone. To this end, we first tested whether the multispecies coalescent without hybridization adequately explains the gene tree heterogeneity observed across the entire species tree. The Tree Incongruence Checking in R (TICR) test (Stenz et al. 2015) showed an excess of outlier quartets (P < 0.01), indicating that a model including ILS but not introgression does not adequately explain the observed gene tree heterogeneity. This suggests that introgression occurred during the evolutionary history of wheatears.
Therefore, we moved on to infer for each branch in the species tree separately whether ILS alone may explain the level of gene tree heterogeneity. To this end, for each internal branch, we estimated the number of gene trees supporting the first and second alternative topologies, based on the rationale that under ILS the first and second alternative gene tree topologies should be supported by an equal number of gene trees (Sayyari and Mirarab 2018). We identified 11 out of 37 internal branches (30%) for which the number of gene trees supporting the two alternative topologies were not significantly different (colored branches in fig. 1). At these 11 internal branches, ILS alone can thus explain gene tree heterogeneity, while asymmetries at the other 26 internal branches may need to invoke other processes.
Contributions of Introgression to Gene Tree Heterogeneity
Given that gene tree heterogeneity at many branches could not be explained by ILS alone, we set out to infer footprints of introgression across wheatears. To this end, we first applied the approach based on D-statistics (Durand et al. 2011) implemented in Dsuite, using > 58 million biallelic SNPs. This approach estimates D and f4 statistics across all possible combinations of trios in wheatears and then performs an f-branch test to assign gene flow to specific internal branches. The f-branch test suggested multiple events of introgression (fig. 3), namely between: 1) Oenanthe halophila and the ancestor of Oenanthe lugens lugens and Oenanthe warriae, 2) O. xanthoprymna and the ancestor of the lugens complex, 3) O. leucopyga and the ancestor of the O. lugubris/lugentoides clade, 4) Oenanthe picata capistrata and the ancestor of O. picata picata and O. albonigra, and 5) O. melanoleuca and O. pleschanka.
Finally, we corroborated the evidence for introgression in the hispanica, lugens, and picata complexes with multispecies coalescent network analyses in phyloNet, allowing for 0-5 introgression events. According to the Bayesian Information Criterion (BIC), models involving reticulation events better fit the data than strictly bifurcating trees in all three complexes (supplementary table S3, Supplementary Material online). In the lugens complex, two introgression events were detected: between O. xanthoprymna and the ancestor of O. lugens (γ = 49%), and between O. halophila and the O. lugens lugens–O. warriae ancestor (γ = 25%) (fig. 4). One introgression event was detected in the picata complex, between O. picata capistrata and the ancestor of O. picata picata and O. albonigra (γ = 8%) (fig. 4). In the hispanica complex, the highest-scoring network involved two introgression edges: one between O. melanoleuca and O. pleschanka (γ = 17%), and one between O. hispanica and the O. cypriaca-melanoeuca ancestor (γ = 1%) (fig. 4).
Discussion
The present study provides first genomic insights into the speciation history of open-habitat chats and into the processes involved in shaping gene tree heterogeneity that may also underpin the high incidence of convergent evolution in this group of songbirds. Our analyses reveal unambiguous species relationships despite considerable gene tree heterogeneity, including several mito-nuclear discordances that result from a combination of ILS and introgression. These relationships reconstructed from genomic data provide the strongest evidence yet for abundant convergent evolution in open-habitat chats, exemplified for three phenotypes in figure 1.
We first discuss how mito-nuclear discordances and incidences of introgression together with known histories of hybridization and biogeography mold into a comprehensive picture of open-habitat chat evolution. We close by concluding based on the indirect evidence presented here that convergent evolution in open-habitat chats likely involved a combination of ILS, introgression and novel mutations in independent lineages. Together, our results paint a picture of genomic and phenotypic evolution that is in part marked by the sharing of ancestral variation and an exchange of genetic variation between species. Our study therefore contributes to the increasing body of evidence that phenotypic and species evolution not only proceed from novel mutations but abundantly re-use genetic variation present in ancestral and related species (Seehausen et al. 2014; Meier et al. 2018; Marques et al. 2019b).
Mito-Nuclear Discordances, Patterns of Introgression, Hybridization History, and Biography Mold into a Coherent Picture of Complex Open-Habitat Chat Evolution
The species relationships inferred from nuclear genomic data were in good agreement with previous phylogenies based predominantly on single mitochondrial markers (Aliabadian et al. 2012; Schweizer and Shirihai 2013) and thereby confirmed the biogeographic history of open-habitat chats (Alaei Kakhki et al. 2016). Still, we recovered several species relationships discordant between the nuclear genome and the mitogenome (Toews and Brelsford 2012). In the light of 1) the histories of introgression also uncovered here, 2) the previously known hybridization history deduced from observed instances of hybridization, and 3) the here confirmed biogeography, most of these mito-nuclear discordances can be well embedded in a coherent history of open-habitat evolution.
The close nuclear relationship of O. albonigra with the nominate subspecies O. p. picata is in stark contrast with the mitochondrial divergence of O. albonigra with all O. picata subspecies about 0.5 Ma (fig. 4). However, as an exception for wheatears, even from a perspective of plumage coloration, the nuclear species tree implies a more parsimonious history of phenotypic evolution, as O. albonigra and O. p. picata display almost identical plumages. The high mitochondrial similarity of all subspecies currently treated under O. picata according to Panov (2005) may be a result of introgressive hybridization. Indeed, the high abundance of admixed phenotypes in zones of contact between the members of this species complex (Panov 2005) suggests a high incidence of hybridization. Different from the hispanica complex, where taxa meet in restricted zones, lineages of the picata complex all mold together in a relatively large area in southern Central Asia, and their degree of reproductive isolation is largely unknown. Further population genomic insights are required from the picata complex to obtain detailed insights into its history of hybridization and phenotypic evolution.
The evolution of the lugens complex was marked by two incidences of introgression that likely underpin the mito-nuclear discordance observed in this complex (fig. 4). Introgression occurred between O. xanthoprymna and the O. lugens ancestor and between north-African O. halophila and the middle eastern O. l. lugens–O. warriae ancestor. Both incidences of introgression make sense in the light of biogeography, as they occurred between geographically neighboring taxa (fig. 4). Together they can explain the close mitochondrial relationship of O. l. persica with O. xanthoprymna and O. chrysopygia: O. xanthoprymna mitochondria were introduced into the O. lugens ancestor by hybridization and may at first have segregated in the O. lugens lineage but then have been lost in O. halophila. Mitochondrial replacement with O. halophila variation upon genetic exchange of the latter taxon with the O. l. lugens–O. warriae ancestor would have left O. l. persica the only taxon with a O. xanthoprymna-like mitogenome. Importantly, our results shed first genomic light on the divergence of Basalt Wheatear (O. warriae), a species with a very restricted range that is interesting from the perspective of phenotypic evolution: this species turns out to be highly similar to O. l. lugens at the genomic level, which contrasts with its marked phenotypic divergence (fig. 4). This result is similar to the situation observed, for instance, in Hooded and Carrion Crows (Corvus cornix and Corvus corone, respectively) (Poelstra et al. 2014) and opens interesting questions on the evolutionary history of this taxon’s coloration.
Finally, in the hispanica complex, the incomplete sorting of mitochondrial variation was previously well documented (Randler et al. 2012; Alaei Kakhki et al. 2018), and footprints of introgression came as no surprise: The complex is characterized by pervasive hybridization of O. melanoleuca with O. pleschanka in several geographic regions (Haffer 1977; Panov 1992) and population genomic analyses suggest rates of introgression of up to almost 20% between these species (Schweizer et al. 2019a). Research is underway to uncover the detailed histories of hybridization in this Eurasian wheatear complex.
The thus far discussed mito-nuclear discordances were all accompanied with high levels of gene tree heterogeneity (most within the phylogenetic anomaly zone). However, most of these cases were not explained by ILS alone but went along with footprints of introgression. Still, part of the observed mito-nuclear discordances might well be a consequence of ILS. In the picata complex, for instance, lineage divergence occurred in rapid succession (fig. 1), and ILS might well be an alternative explanation for the mitochondrial divergence of the O. albonigra mitogenome. In addition, in the clade including O. heuglinii and the very widespread O. isabellina, species split in fast succession and the high levels of ILS likely explain the observed mito-nuclear discordance.
Taken together, our results demonstrate that the speciation history of open-habitat chats is similarly complex as their phenotypic evolution. Multiple events of introgression at both extant and ancestral time scales, along with abundant ILS, contributed to reticulate evolution and thus a mosaic of genomic variation in several clades of wheatears. Our study thus adds to an increasing number of examples (Enciso-Romero et al. 2017; Han et al. 2017; Meier et al. 2017; Lamichhaney et al. 2018) highlighting that species diversification is often complex and rather than by a linear process is at least in part a network of interacting lineages (Marques et al. 2019b)
Diverse Routes to Convergent Evolution in Open-Habitat Chats
The reconstruction of relationships among open-habitat chats using genomic data has a deep impact on our understanding of phenotypic evolution in these songbirds: the species tree provides firm evidence for an extraordinary incidence of convergent evolution (fig. 1). For numerous traits, including plumage coloration, sexual dimorphism, and migration behavior, not related species display more similar phenotypes than sister species (fig. 1). Almost entirely black plumages, for instance, evolved in five clades (Oenanthe picata opistholeuca, O. warriae, O. leucura, female Myrmecocichla monticola, and juvenile O. leucopyga), and sexually monomorphic female-type plumage is found in another five clades (O. chrysopygia, Oenanthe fusca, the Oenanthe melanura clade, the O. isabellina clade, and in the sub-Saharan clade), to name just two out of many examples.
Furthermore, our results suggest (directly for introgression and ILS, indirectly for novel mutations) that convergent evolution in open-habitat chats is unlikely explained by a single process but may need to invoke all three processes (Hedrick 2013; Natarajan et al. 2015; Pease et al. 2016; Konečná et al. 2021; Montejo-Kovacevich et al. 2021), with the most likely processes depending on both demography and the phylogenetic scale.
For ILS to substantially contribute to convergent evolution, species must usually diverge in fast succession and maintain critically high effective population sizes to pass on ancestral variation and maintain it in daughter lineages. In open-habitat chats, such fast radiations occurred predominantly at rather recent time scales. The shortest split intervals are observed (in increasing order) in the picata, hispanica, and lugens complexes (fig. 1). However, convergent evolution of species in the lugens complex and of the picata complex is only found with other clades but not within the complexes. Given that the levels of ILS at the root of the lugens complex are restricted, ILS is unlikely to have contributed to convergent evolution with other clades of wheatears sporting, for instance, similar plumages (see for instance the aforementioned example including O. warriae). Convergent evolution is, however, observed for back and neck-side coloration in the hispanica complex (Schweizer et al. 2019a), and could be explained by ILS of ancestral variation.
Likewise, introgression would need to have happened between taxa with similar phenotype to explain convergent evolution. Our analyses indeed uncovered several instances of in part substantial introgression (figs. 3 and 4). However, despite suggesting that introgression upon hybridization provided the opportunity to exchange phenotypes between species, none of the inferred introgression events can be tied to concrete examples of convergent evolution. This raises the question, whether the methods applied here are underpowered to infer footprints of introgression relevant to phenotypic evolution of open-habitat chats, or, indeed, introgression played a limited role in these songbirds’ phenotypic evolution.
Finally, we may need to invoke novel mutations to explain at least part of the observed convergent evolution, because phenotypic similarities are found between rather divergent species and inferred instances of high ILS and introgression cannot easily explain them. Many if not most phenotypic similarities in open-habitat chats are found in the rather distant major phylogenetic clades that diverged around 5 Ma (for instance the examples provided at the entry of the discussion). The time tree suggests that the relevant split events did not occur within short evolutionary time scales. Accordingly, levels of ILS are rather low for at least one of the relevant nodes (fig. 1). Although gene tree heterogeneity was non-negligible for the larger of the two major wheatear clades, gene trees were mostly concordant for the root nodes of the wheatear clade including the hispanica complex and the Oenanthe oenanthe and O. isabellina clades (fig. 1). Moreover, the phenotypically similar species occur in geographically well separated ranges and introgression between them is thus rather unexpected. In conclusion, unless the approaches used here to detect the ILS and introgression are underpowered, the indirect evidence provided by our results suggests that many incidences of convergent evolution at such time scale may have involved independent novel mutations.
Conclusion
In the present study we set out to probe gene tree variation for footprints of ILS and introgression with the goal of understanding how ILS and introgression may have contributed to convergent evolution in open-habitat chats. Our results reveal a complex speciation history and provide conclusive evidence for abundant convergent evolution in open-habitat chats. While we cannot conclude on the involvement of specific processes in the evolution of specific convergent evolution, the indirect evidence gained from the structure of the species tree and inferred levels of ILS and introgression suggest that convergent evolution in open-habitat chats likely occurred via all three possible processes, namely ILS, introgression, and novel mutations. Thereby, our results contribute to a growing body of evidence that evolution makes use and re-use of all resources it has at hand, including both standing (ancestral or heterospecific) as well as novel genetic variation.
Finally, the approach applied here based predominantly on a characterization of gene tree heterogeneity outlines an avenue to probe the processes governing convergent evolution in a wide range of systems. Even though the evidence for the involvement of these processes is indirect, ultimately, at a comparative scale this evidence may provide valuable insights into the relative contributions of ILS, introgression, and novel mutations to convergent evolution.
Materials and Methods
Taxon Sampling, DNA Extraction, and Whole-Genome Resequencing
Aiming for complete taxon sampling, we sequenced the genomes of 50 open-habitat chat taxa from a total of 44 species from the genera Oenanthe, Campicoloides, Emarginata, Myrmecocichla, Pinarochroa, and Thamnolaea (fig. 2; supplementary table S1, Supplementary Material online). This sampling included all but three species (Emarginata tractrac, Myrmecocichla collaris, Thamnolaea coronata) of the 47 currently recognized open-habitat chat species (Gill et al. 2020). A genome sequence of S. maurus (European Nucleotide Archive [ENA] accession number: ERR2560200-ERR2560209), a species of open-habitat chats’ sister lineage (Sangster et al. 2010; Zuccon and Ericson 2010), was included as an outgroup to root the open-habitat chat species tree. We followed the taxonomy of the IOC World Bird List (v12.1) (Gill et al. 2020) except for the picata complex, where we treat subspecies picata, capistrata, and opistholeuca separately, following Panov (2005).
We extracted DNA from blood stored in ≥96% ethanol or Queen's Lysis buffer or tissues stored in 96% ethanol for taxa for which fresh material was available, or from toepads or dried skin from skin-preparation sutures for taxa for which only museum samples were available (supplementary table S1, Supplementary Material online). From blood and tissue samples DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen) or the MagAttract HMW DNA kit (Qiagen) following the manufacturer’s protocol with exception of an adapted digestion of blood samples as reported in Lutgen et al. (2020). DNA from toepads and dried skin was extracted using the QIAamp DNA Micro Kit (Qiagen) with an adapted digestion protocol that ensures high quantities of DNA (dx.doi.org/10.17504/protocols.io.bm4mk8u6). DNA concentrations were quantified on a Qubit fluorometer (dsDNA BR assay, Thermo Fisher Scientific) and DNA integrity was evaluated on a TapeStation (Agilent). We prepared sequencing libraries using the ThruPLEX DNA-Seq Kit (Takara), the Illumina DNA Prep Kit, the Illumina DNA PCR-free Kit, or the Chromium Genome Library kit (10X Genomics) for intact DNA, or for fragmented DNA with the ACCEL-NGS 1S DNA Library Prep Kit (Swift Biosciences) (supplementary table S1, Supplementary Material online). All libraries were sequenced (150 bp paired-end) on Illumina NovaSeq6000 instruments with a target coverage of ca. 15x.
Data Preparation
Adapter Trimming and Mapping of Resequencing Data
Prior to further analysis, for all but the linked-read sequencing data, we trimmed adapters and merged overlapping paired-end reads using fastp 0.20.0 (Chen et al. 2018). For linked-read sequences, we trimmed the first 22 bp on the R1 read to eliminate the 10X indexes. We then mapped the reads to the reference genome assembly of Oenanthe melanoleuca (Peona et al. 2022) using BWA 0.7.17 (Li 2013) and marked duplications with PicardTools 2.9.1 (http://broadinstitute.github.io/picard). After excluding duplicates, the average sequencing coverage per individual ranged from 4.6x to 40.6x (mean and median 12.2, standard deviation 6.20) (supplementary table S1, Supplementary Material online).
Base Quality Score Recalibration (BQSR), SNP Calling, and SNP Genotyping
Data preparation followed the GATK 4.1.4.1 (McKenna et al. 2010) best practices pipeline. First, to prepare a list of high-confidence SNPs for BQSR, we ran HaplotypeCaller to generate gvcf files for each sample and then merged gvcf files of all samples with CombineGVCFs before genotyping SNPs using GenotypeGVCFs. To retain only high-confidence SNPs in the SNP-exclude set for BQSR, we retained only SNPs that fulfilled the following criteria: mapping quality > 40, Fisher strand (FS) phred-scaled P-value < 60, SNP quality score > 20, mapping quality rank-sum value > –12.5, read pos rank-sum test value > –8.0 and quality by depth > 2. We retained only biallelic SNPs with at least one homozygous reference and one homozygous alternative genotype or with at least three observations of reference and alternative alleles. We excluded the resulting set of SNPs from BQSR in GATK. Following BQSR, we ran HaplotypeCaller on base-score-recalibrated bam files. The resulting gvcf files of all samples where merged (CombineGVCFs) and variant and invariant sites genotyped using the “include-non-variant sites” flag in GenotypeGVCFs. For all subsequent analyses we based genotypes on genotype likelihoods. This resulted in 871,428,254 unfiltered sites when the outgroup was included and 872,152,150 unfiltered sites without the outgroup.
In phylogenomic data sets, which are based on mapping of resequencing data to a reference genome, data of species more divergent from the reference genome may risk mapping at a lower percentage. To check for such mapping-related biases in our dataset, we estimated the average number of nucleotide differences (dXY) between Oenanthe melanoleuca (reference genome) and all other species using pixy 0.95.02 (Korunes and Samuk 2021). We then estimated the mapping percentage for all species using SAMtools (Li et al. 2009) and tested whether there was a correlation between dXY and mapping success.
Data Filtering
Before data analysis, we removed all repeat regions from the multi-sample VCF file using the repeat mask reported in Peona et al. (2022). Then we used BCFtools 1.11 (Li 2011) to remove indels, sites close to indels (up to 10 bp) and all the sites at which exclusively alternative alleles were called. For analyses requiring variant sites only, we removed all SNPs with more than 20% missing data and all invariant sites using BCFtools and retained only SNPs with a minimum read depth of five. To ensure linkage-disequilibrium (LD) among SNPs, we LD-pruned SNPs in VCFtools 0.1.16 (Danecek et al. 2011) such as to only retain SNPs with a minimum distance of 1 kb between them. This physical distance is expected to remove most LD between SNPs, as e.g. in flycatchers LD breaks down in most genomic regions after 1 kb (Ellegren et al. 2012). After this filtering, we genotyped based on genotype likelihoods and retained 994,150 multiallelic SNPs. In addition, for analyses that require biallelic SNPs exclusively, we removed all multiallelic SNPs from the VCF file after the above filtering, using BCFtools.
For phylogenomic analyses requiring sequence data including both variant and invariant sites, we followed two strategies. First, we defined 10 kb non-overlapping windows across the genome. Henceforth, we refer to the windowed data as “loci” and to phylogenetic trees inferred therefrom as “gene trees”. Second, we inferred BUSCO, using BUSCO 5.0.0 (Simão et al. 2015). Similar to ultraconserved elements, UCE (Faircloth et al. 2012), BUSCO feature a high degree of conservation and moreover are present in single copies, circumventing issues with paralogs in phylogenomic reconstructions (Roy 2009). BUSCO are readily identified in whole-genome resequencing data sets, not requiring genome alignments, and are increasingly deployed for phylogenomic reconstructions (Kallal et al. 2021; Van Damme et al. 2022).
To make sure that the adopted filtering strategy did not affect our results, we generated four sets of fasta alignments using different filter settings for minimum read depth (DP), minimum percentage of the window covered by data (PW), and MD for both the 10 kb loci and the BUSCO data set: 1) DP = 1, PW = 50%, MD = 15%, 2) DP = 5, PW = 50%, MD = 15%, 3) DP = 1, PW = 50%, MD = 5%, and 4) DP = 1, PW = 80%, MD = 10% (supplementary Table S2, Supplementary Material online). These four filtering strategies yielded the same species tree and concatenated tree for 10 kb loci as well as for BUSCO. For these analyses, we therefore exclusively report the results based on the most stringent filtering on read depth (ii, DP = 5, PW = 50%, MD = 15%) (supplementary Table S2, Supplementary Material online). For gene tree heterogeneity analyses, on the other hand, we aimed to include the broadest representation of the genome and to this end retained all loci (N = 29,730) that fulfilled less stringent filtering criteria (i, DP = 1, PW = 50%, MD = 15%) (supplementary Table S2, Supplementary Material online).
Finally, for analyses making assumptions on intra- and inter-locus recombination (such as species tree reconstructions) we made sure to include only loci with no intra-locus but free inter-locus recombination (supplementary Table S2, Supplementary Material online). To this end, we excluded all loci with recombination signals (P ≤ 0.05) as inferred from the pairwise homoplasy index Phi (Φw) estimated in PhiPack 1.1 program (Bruen et al. 2006). The criterion P ≤ 0.05 does not account for multiple testing, but we preferred to conservatively exclude loci with evidence for intra-locus recombination. To possibly retain only loci among which free recombination occurs, we ensured a minimum distance of 10 kb by including no two consecutive loci. At this distance, no LD occurs in flycatchers (Ellegren et al. 2012).
Inference of BUSCO Sequences
Phylogenomic analyses based on the mapping of resequencing data to a reference genome, especially when including species well diverged from the latter, may be affected by several biases. For species more divergent from the reference genome, data from faster evolving genomic regions 1) risks not being mapped, if these regions are too diverged from the reference sequence, or 2) may map to paralogs, if the species experienced different duplication histories (Chakrabarty et al. 2017; Fitz-Gibbon et al. 2017). These biases are expected to be least important in slowly evolving regions of the genome, especially in BUSCO, that are conserved and by definition present in single copies in most species. To minimize mapping-related biases in our phylogenomic reconstructions, especially on rooting and placements of the most divergent species, we therefore extracted the intervals in which avian BUSCO (aves_odb10) are situated in our reference genome using BUSCO 5.0.0 (Simão et al. 2015).
Phylogenomic Reconstructions and Multispecies Coalescent Analyses
BUSCO-Based Rooting of the Open-Habitat Chat Species Tree
First, to establish the root within open-habitat chats, we applied both concatenation and multispecies coalescent-based methods on BUSCO sequences, including the outgroup. First, we used all BUSCO (N = 7,335) to estimate the ML tree in IQ-TREE 2.1.2 (Minh et al. 2020b) based on the concatenated BUSCO, with one partition for each BUSCO and a GTR + I + G substitution model for all partitions (Abadi et al. 2019). One thousand bootstrap replicates were run using the ultrafast bootstrap approximation (Hoang et al. 2018). Second, we estimated the species tree under the multispecies coalescent using ASTRAL-III (Zhang et al. 2018) based on BUSCO without recombination signals and free inter-locus recombination (N = 2,091). To this end, we inferred BUSCO gene trees in IQ-TREE 2.1.2 using a GTR + I + G substitution model and one thousand ultrafast bootstrap approximations. To ensure that species tree inferences were not affected by inaccurately estimated gene trees (Zhang et al. 2018), we collapsed branches with bootstrap support inferior to 80% using Newick Utilities 1.6 (Junier and Zdobnov 2010). Reconstructing the species tree by including all BUSCO not considering intra- and inter-locus recombination (N = 7,335) did not affect the result.
Phylogenomic and Multispecies Coalescent Analyses Based on Full Evidence
To reconstruct the concatenated tree and species tree based on full evidence data, that is, data from the maximal possible fraction of the genome, and to study gene tree heterogeneity along the genome, we excluded the Saxicola outgroup. Instead, we rooted the trees with the clade that is the outgroup to all other open-habitat chats (sub-Saharan clade, supplementary fig. S1, Supplementary Material online). Excluding Saxicola ensured that analyses were not biased by mapping issues caused by this outgroup’s divergence.
To estimate the concatenated tree using ML in IQ-TREE 2.1.2 we used all loci with a GTR + I + G substitution model and 1,000 ultrafast bootstrap approximations. To estimate the species tree under the multispecies coalescent using ASTRAL-III, we at first estimated ML gene trees using IQ-TREE 2.1.2 with a GTR + I + G substitution model and one thousand ultrafast bootstrap approximations. Based on these gene trees (pruned for within-locus recombination and assuring free recombination between loci), we inferred the species tree using ASTRAL-III. Because ASTRAL relies on accurately estimated gene trees, we collapsed branches with bootstrap support inferior to 80% using Newick Utilities 1.6.
To find regions of the species tree that represent “anomaly zones” where the frequency of one of the alternative quartets is higher than that of the topology in agreement the species tree, we estimated local quartet supports for the main topology and its two alternatives in ASTRAL-III (Degnan and Rosenberg 2006). We used the anomaly_finder.py script to search for anomaly zones in our species tree (Linkem et al. 2016). To test if the gene tree discordance could be explained by polytomies instead of bifurcating nodes, we carried out a quartet-based polytomy test as implemented in ASTRAL-III.
To see whether the SNP-based species tree could confirm the sequence-based species tree, we used the unlinked multiallelic SNPs to the multispecies coalescent model implemented in SVDQuartets (Chifman and Kubatko 2014) in PAUP* 4 (Swofford 2003). We ran this with 1000 bootstrap replicates and summarized the result in a 50% majority-rule consensus tree.
Phylogenetic Relationships of Mitogenomes
We were interested in whether previously inferred relationships based predominantly on single mitochondrial genes (Aliabadian et al. 2012; Schweizer and Shirihai 2013; Alaei Kakhki et al. 2016) were supported by full mitogenomes and in how the mitogenomic relationships compare with the ones inferred from nuclear loci. To this end, we extracted and assembled mitochondrial genomes from the genomic data of all open-habitat chats using MitoFinder 1.2 (Allio et al. 2020). We used the published Isabelline Wheatear (Oenanthe isabellina) mitochondrial genome as a reference (Genbank accession number: NC_040290.1) and annotated the mitochondrial genome using the annotation pipeline integrated in MitoFinder. Finally, we aligned the 13 mitochondrial protein-coding gene sequences using the automatic alignment strategy in MAFFT 7.471 (Katoh and Standley 2013). We checked the alignments in AliView 1.26 (Larsson 2014) and removed stop codons within the coding sequences or indels for downstream analyses. We determined the best partition scheme using the Akaike information criterion (AIC) implemented in PartitionFinder 2.1.1 (Lanfear et al. 2017) and used the GTR + G + I model for all partitions. Then we constructed the ML tree from the concatenated supermatrix of all 13 genes in IQ-TREE 2.1.2 using the ultrafast bootstrap approximations with 1,000 replicates.
Dating Analyses
Beside species’ relationships we were interested in estimating the divergence time in open-habitat chats. Because there are no appropriate fossils for calibration, we first ran BEAST 2.6.6 (Bouckaert et al. 2019) for 13 mitochondrial protein-coding genes to estimate a time-calibrated mitochondrial phylogeny. We included the mitochondrial genome sequence of S. maurus (GenBank accession number: MN356403.1) as an outgroup in these analyses. Substitution models were inferred during the MCMS analyses with bModelTest (Bouckaert and Drummond 2017) implemented as a package in BEAST 2.6.6. Published substitution rates for each mitochondrial gene (Lerner et al. 2011) were implemented as means of the clock rates in real space of lognormal distribution with standard deviations of 0.005. We defined a Yule speciation process for the tree prior and an uncorrelated lognormal relaxed clock model. Three independent MCMC chains were run for 50 million generations, each with sampling every 5,000 generations. Effective sample sizes (ESS) for all parameters and appropriate numbers of burn-in generations were checked with Tracer 1.5 (Rambaut and Drummond 2009). The three independent runs were combined using LogCombiner 2.6.6 (Bouckaert et al. 2019). We used TreeAnnotator 2.6.6 (Bouckaert et al. 2019) to calculate a maximum clade credibility tree and the 95% HPD distributions of each estimated node.
We then used the divergence time of the sub-Saharan clade from wheatears estimated from mitochondrial data as time constraint in dating analyses based on nuclear data using RelTime-ML implemented in MEGA 11 (Tamura et al. 2021). For this analysis, we provided the topology with branch length estimated in IQ-TREE based on concatenated BUSCO data retained after the most stringent filtering (ii, DP = 5, PW = 50%, MD = 15%), along with high-confidence BUSCO alignments. The latter consisted of BUSCO data filtered for DP = 5, MD = 5% and length of each BUSCO alignments longer than 1 kb. We used the same filtering to get the 10 kb non-overlapping windows across the genome and used the concatenated tree retained after most stringent filtering (ii, DP = 5, PW = 50%, MD = 15%) to repeat the analyses based on loci across the genome. To ensure that the differences in divergence times between mitochondrial and nuclear data were not due to the different dating approaches, we re-estimated the mitochondrial divergence times in RelTime-ML using the same approach as for the nuclear datasets.
Inference of Gene Tree Variation, ILS, and Introgression
Inference of the Levels of Gene Tree Variation
To investigate gene tree heterogeneity across the genome, we used gene trees inferred from less stringent filtering criteria (i, DP = 1, PW = 50%, MD = 15%) as described above. To infer how many gene trees reflect the species tree topology, we used the script ‘findCommonTrees.py’ (Edelman et al. 2019). To characterize the levels of gene tree heterogeneity across open-habitat chats, we compared the gene trees with the species tree. Specifically, we estimated “ICA” and the “gCF”. ICA quantifies the amount of gene tree heterogeneity for each internode of the species tree by calculating the number of all most prevalent conflicting bipartitions. It takes values ranging from −1 to 1, with values around zero indicating strong conflict; values toward 1 indicate robust concordance of gene trees with the species tree in the bipartition of interest; and negative values indicate discordance between the bipartition of interest and one or more bipartitions with a higher frequency (Salichos et al. 2014). While ICA thus represents the degree of conflict on each node of a species tree, gCF better reflects the gene tree heterogeneity around each branch, and is the percentage of gene trees supporting the two alternative topologies for each branch (Minh et al. 2020a). We estimated ICA and gCF with PhyParts 0.0.1 (Smith et al. 2015) and IQ-TREE 2.1.2, respectively.
Tests of an ILS Model
Next, we were interested in understanding whether ILS can sufficiently explain the level of gene tree heterogeneity observed at the level of the whole species tree. To this end, we applied the TICR test (Stenz et al. 2015) implemented in the Phylolm R package. This test evaluates whether the multispecies coalescent adequately explains gene tree heterogeneity across the species tree with no hybridization edges. TICR requires posterior distributions of gene tree topologies inferred through Bayesian inference of gene trees. Therefore, we first estimated posterior distributions of individual gene trees with MrBayes 3.2.7 (Ronquist et al. 2012). MrBayes analyses ran using three independent runs of 20 million generations each, sampling every 20,000th generation using a GTR + I + G model. We estimated the length of burn-in using Tracer 1.5 (Rambaut and Drummond 2009) to ensure that our sampling of the posterior distribution had reached sufficient ESS (ESS > 200) for parameter estimation. We then ran BUCKy (Ané et al. 2007; Larget et al. 2010) using the posterior distribution of gene trees after discarding 25% as burn-in to estimate the concordance factors (CFs) for the three possible splits of all quartets. The inferred CF values were then tested against those expected under a coalescent model that takes ILS but not hybridization into account (χ2 test).
We then tested for each branch in the species tree whether the gene tree heterogeneity reflected in gCF can be sufficiently explained by a model incorporating ILS alone. Under ILS alone—assuming sorting of variation occurs by random genetic drift—proportions of alternative gene trees for a rooted triplet are expected to be approximately equal (Sayyari and Mirarab 2018; Sayyari et al. 2018; Hibbins and Hahn 2022), and the concordant tree topology (the topology in agreement with the species tree) should be at least as frequent as the two discordant topologies (Sayyari et al. 2018; Hibbins and Hahn 2022). In contrast, introgression between non-sister taxa results in asymmetric proportions of gene trees in the rooted triplet (Green et al. 2010; Durand et al. 2012). Therefore, we performed a χ2 tests comparing the number of gene trees supporting the two discordant topologies. Under ILS, these two alternative topologies are expected to be equally frequent among gene trees (He et al. 2020). For all these analyses we accounted for uncertainty in gene tree topologies by collapsing branches with bootstrap support <80%.
Inferring Footprints of Introgression
To infer footprints of introgression across the entire species tree, we estimated Patterson’s D (Durand et al. 2011) and related statistics in Dsuite (Malinsky et al. 2021) based on 58,963,109 biallelic SNPs. D and f4 statistics were estimated across all possible combinations of trios in our 38 wheatear taxa. We used Dtrios to calculate the sums of three different patterns (BABA, BBAA, and ABBA) and D and f4 ratio statistics for all 8,437 possible trios. Dsuite uses the standard block-jackknife procedure to assess the significance of the D statistic. Due to the large number of D-statistics comparisons and difficulties disentangling false positives that may arise due to ancient gene flow, we performed the f-branch test (fb) implemented in Dsuite to assign gene flow to specific internal branches on the species tree. Then, we visualized the output using Dsuite’s dtools.py script.
We then aimed to obtain further support for the footprints of introgression that were suggested in lugens, picata, and hispanica complex by the above approach based on the D-statistics. To this end, for these three complexes, we estimated phylogenetic networks from ML trees generated from BUSCO using the pseudolikelihood (InferNetwork_MPL) (Yu and Nakhleh 2015) and likelihood (CalGTProb) (Yu et al. 2014) approaches implemented in phyloNet 3.6.9 (Than et al. 2008). Due to the high computational demands, analyses were run for each of the clades containing signals of introgression in earlier analyses separately, namely for the lugens, picata, and hispanica complexes. Furthermore, we only included BUSCO loci that had data available for all taxa of the respective complex. Outgroup species for each complex were selected based on the species tree. Analyses included 7,323 rooted gene trees for the lugens complex, 7,310 rooted gene trees for the picata complex, and 7,335 rooted gene trees for the hispanica complex. For each complex, we allowed for one to five reticulation events, with the starting tree corresponding to the species tree topology (-s), 0.9 bootstrap threshold for gene trees (-b), and 1,000 iterations (-x). To ensure convergence, the network searches were repeated 10 times. Then we estimated the likelihood by fixing the topology of the focal clade for the species tree (without any reticulation) and for each of the five networks (with different numbers of introgression edges) and calculated their likelihood scores. We determined the optimal network by calculating the BIC from the ML scores, the number of gene trees, the number of branch length being estimated, plus the number of admixture edges in each model (supplementary Table S3, Supplementary Material online). We used the browser-based tree viewer IcyTree (Vaughan 2017) to visualize the estimated networks.
Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.
Supplementary Material
Acknowledgments
We thank all the natural history museums and their staff who provided material for this study, namely the American Museum of Natural History, New York; Natural History Museum, Tring; Field Museum of Natural History, Chicago; Natural History Museum of Los Angeles County, Los Angeles; Museum of Vertebrate Zoology, UC Berkeley; Natural History Museum, University of Oslo; Naturhistoriska Riksmuseet, Stockholm; Texas A&M University Biodiversity Research and Teaching Collections, Texas College Station; University of Washington Burke Museum, Seattle; Yale Peabody Museum; Zoological Museum, Natural History Museum of Denmark, Copenhagen; Zoologisches Forschungsmuseum König, Bonn; and Martin Haase, Vogelwarte Hiddensee, Universität Greifswald. Further samples were provided by José Luis Copete and Marc Illa. We are indebted to the sequencing facilities of NGI Sweden in Solna and to the NGS platform of the University of Berne and their respective staff for their excellent services and to Marta Burri for sequencing library preparations. Computations were performed at the High-Performance Computing Cluster EVE, a joint effort of the Helmholtz Centre for Environmental Research (UFZ) and the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. We thank the administration and support staff of EVE, Thomas Schnicke and Ben Langenberg (UFZ), and Christian Krause (iDiv). We thank Chris Rose and Claire Weatherhead from Bloomsbury Publishing Plc for their permission to use bird drawings in our figures. This research was supported by a German Research Foundation (DFG) research grant (BU3456/3-1) to R.B., the National Research Fund (FNR), Luxembourg, grant number 14575729 to D.L., and a Georg Foster Research Stipend of the Alexander von Humboldt Foundation and a scholarship for female researchers from Friedrich-Schiller-University Jena, both to N.A.K.
Contributor Information
Niloofar Alaei Kakhki, Department of Population Ecology, Institute of Ecology and Evolution, Friedrich-Schiller-University Jena, Jena, Germany.
Manuel Schweizer, Natural History Museum Bern, Bern, Switzerland; Institute of Ecology and Evolution, University of Bern, Bern, Switzerland.
Dave Lutgen, Department of Population Ecology, Institute of Ecology and Evolution, Friedrich-Schiller-University Jena, Jena, Germany; Institute of Ecology and Evolution, University of Bern, Bern, Switzerland; Swiss Ornithological Institute, Sempach, Switzerland.
Rauri C K Bowie, Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA; Department of Integrative Biology, University of California, Berkeley, CA, USA.
Hadoram Shirihai, Natural History Museum Bern, Bern, Switzerland.
Alexander Suh, School of Biological Sciences, University of East Anglia, Norwich, United Kingdom; Department of Organismal Biology – Systematic Biology (EBC), Science for Life Laboratory, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
Holger Schielzeth, Department of Population Ecology, Institute of Ecology and Evolution, Friedrich-Schiller-University Jena, Jena, Germany; German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Leipzig, Germany.
Reto Burri, Department of Population Ecology, Institute of Ecology and Evolution, Friedrich-Schiller-University Jena, Jena, Germany; Institute of Ecology and Evolution, University of Bern, Bern, Switzerland; Swiss Ornithological Institute, Sempach, Switzerland.
Author Contributions
R.B. and N.A.K. designed the study. N.A.K., D.L., and M.S. performed data analysis with inputs from H.Sc. and R.B. R.C.K.B., A.S., and H.Sh. provided materials. M.S. designed the figures. N.A.K. and R.B. wrote the manuscript with help from M.S. and H.Sc. and inputs from all authors.
Data Availability
All sequencing data produced in the framework of this study are available on the ENA under project accession PRJEB58431.
References
- Heliconius Genome Consortium . 2012. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature. 487:94–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abadi S, Azouri D, Pupko T, Mayrose I. 2019. Model selection may not be a mandatory step for phylogeny reconstruction. Nat Commun. 10(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alaei Kakhki N, Aliabadian M, Förschler MI, Ghasempouri SM, Kiabi BH, Verde Arregoitia LD, Schweizer M. 2018. Phylogeography of the oenanthe hispanica–pleschanka–cypriaca complex (Aves, Muscicapidae: Saxicolinae): diversification history of open-habitat specialists based on climate niche models, genetic data, and morphometric data. Zoolog Syst Evol Res. 56(3):408–427. [Google Scholar]
- Alaei Kakhki N, Aliabadian M, Schweizer M. 2016. Out of Africa: biogeographic history of the open-habitat chats (Aves, Muscicapidae: Saxicolinae) across arid areas of the old world. Zoologica Scripta. 45(3):237–251. [Google Scholar]
- Aliabadian M, Kaboli M, Förschler MI, Nijman V, Chamani A, Tillier A, Prodon R, Pasquet E, Ericson PG, Zuccon D. 2012. Erratum to: convergent evolution of morphological and ecological traits in the open-habitat chat complex (Aves, Muscicapidae: Saxicolinae). Mol Phylogenet Evol. 65(3):35–45. [DOI] [PubMed] [Google Scholar]
- Allio R, Schomaker-Bastos A, Romiguier J, Prosdocimi F, Nabholz B, Delsuc F. 2020. Mitofinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. Mol Ecol Resour. 20(4):892–905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ané C, Larget B, Baum DA, Smith SD, Rokas A. 2007. Bayesian estimation of concordance among gene trees. Mol Biol Evol. 24(2):412–426. [DOI] [PubMed] [Google Scholar]
- Arendt J, Reznick D. 2008. Convergence and parallelism reconsidered: what have we learned about the genetics of adaptation? Trends Ecol Evol. 23(1):26–32. [DOI] [PubMed] [Google Scholar]
- Barrett RD, Schluter D. 2008. Adaptation from standing genetic variation. Trends Ecol Evol. 23(1):38–44. [DOI] [PubMed] [Google Scholar]
- Besnard G, Muasya AM, Russier F, Roalson EH, Salamin N, Christin P-A. 2009. Phylogenomics of C4 photosynthesis in sedges (cyperaceae): multiple appearances and genetic convergence. Mol Biol Evol. 26(8):1909–1919. [DOI] [PubMed] [Google Scholar]
- Bouckaert RR, Drummond AJ. 2017. Bmodeltest: Bayesian phylogenetic site model averaging and model comparison. BMC Evol Biol. 17(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N. 2019. Beast 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol. 15(4):e1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bruen TC, Philippe H, Bryant D. 2006. A simple and robust statistical test for detecting the presence of recombination. Genetics. 172(4):2665–2681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brusatte SL, O’Connor JK, Jarvis ED. 2015. The origin and diversification of birds. Curr Biol. 25(19):R888–R898. [DOI] [PubMed] [Google Scholar]
- Campagna L, Repenning M, Silveira LF, Fontana CS, Tubaro PL, Lovette IJ. 2017. Repeated divergent selection on pigmentation genes in a rapid finch radiation. Sci Adv. 3(5):e1602404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chakrabarty P, Faircloth BC, Alda F, Ludt WB, Mcmahan CD, Near TJ, Dornburg A, Albert JS, Arroyave J, Stiassny ML. 2017. Phylogenomic systematics of ostariophysan fishes: ultraconserved elements support the surprising non-monophyly of characiformes. Syst Biol. 66(6):881–895. [DOI] [PubMed] [Google Scholar]
- Chen S, Zhou Y, Chen Y, Gu JJB. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. 34(17):i884–i890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chifman J, Kubatko L. 2014. Quartet inference from SNP data under the coalescent model. Bioinformatics. 30(23):3317–3324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chou J, Gupta A, Yaduvanshi S, Davidson R, Nute M, Mirarab S, Warnow T. 2015. A comparative study of svdquartets and other coalescent-based species tree estimation methods. BMC Genom. 16(10):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christin P-A, Salamin N, Savolainen V, Duvall MR, Besnard G. 2007. C4 photosynthesis evolved in grasses via parallel adaptive genetic changes. Curr Biol. 17(14):1241–1247. [DOI] [PubMed] [Google Scholar]
- Colosimo PF, Hosemann KE, Balabhadra S, Villarreal G, Dickson M, Grimwood J, Schmutz J, Myers RM, Schluter D, Kingsley DM. 2005. Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science. 307(5717):1928–1933. [DOI] [PubMed] [Google Scholar]
- Cresko WA, Amores A, Wilson C, Murphy J, Currey M, Phillips P, Bell MA, Kimmel CB, Postlethwait JH. 2004. Parallel genetic basis for repeated evolution of armor loss in Alaskan threespine stickleback populations. Proc Natl Acad Sci U S A. 101(16):6050–6055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST. 2011. The variant call format and VCFtools. Bioinformatics. 27(15):2156–2158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Degnan JH, Rosenberg NA. 2006. Discordance of species trees with their most likely gene trees. PLoS Genet. 2(5):e68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doyle JJ. 1997. Trees within trees: genes and species, molecules and morphology. Syst Biol. 46(3):537–553. [DOI] [PubMed] [Google Scholar]
- Durand S, Bouché N, Strand EP, Loudet O, Camilleri C. 2012. Rapid establishment of genetic incompatibility through natural epigenetic variation. Curr Biol. 22(4):326–331. [DOI] [PubMed] [Google Scholar]
- Durand EY, Patterson N, Reich D, Slatkin M. 2011. Testing for ancient admixture between closely related populations. Mol Biol Evol. 28(8):2239–2252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edelman NB, Frandsen PB, Miyagi M, Clavijo B, Davey J, Dikow RB, García-Accinelli G, Van Belleghem SM, Patterson N, Neafsey DE. 2019. Genomic architecture and introgression shape a butterfly radiation. Science. 366(6465):594–599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ellegren H, Smeds L, Burri R, Olason PI, Backström N, Kawakami T, Künstner A, Mäkinen H, Nadachowska-Brzyska K, Qvarnström A. 2012. The genomic landscape of species divergence in ficedula flycatchers. Nature. 491(7426):756–760. [DOI] [PubMed] [Google Scholar]
- Elmer KR, Meyer A. 2011. Adaptation in the age of ecological genomics: insights from parallelism and convergence. Trends Ecol Evol. 26(6):298–306. [DOI] [PubMed] [Google Scholar]
- Enciso-Romero J, Pardo-Díaz C, Martin SH, Arias CF, Linares M, McMillan WO, Jiggins CD, Salazar C. 2017. Evolution of novel mimicry rings facilitated by adaptive introgression in tropical butterflies. Mol Ecol. 26(19):5160–5172. [DOI] [PubMed] [Google Scholar]
- Eyre-Walker A, Keightley PD. 2007. The distribution of fitness effects of new mutations. Nat Rev Genet. 8(8):610–618. [DOI] [PubMed] [Google Scholar]
- Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC. 2012. Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Syst Biol. 61(5):717–726. [DOI] [PubMed] [Google Scholar]
- Fitz-Gibbon S, Hipp AL, Pham KK, Manos PS, Sork VL. 2017. Phylogenomic inferences from reference-mapped and de novo assembled short-read sequence data using RADseq sequencing of California white oaks (Quercus section Quercus). Genome. 60(9):743–755. [DOI] [PubMed] [Google Scholar]
- Funk DJ, Omland KE. 2003. Species-level paraphyly and polyphyly: frequency, causes, and consequences, with insights from animal mitochondrial DNA. Annu Rev Ecol Evol. 34(1):397–423. [Google Scholar]
- Genner MJ, Turner GF. 2012. Ancient hybridization and phenotypic novelty within lake Malawi’s cichlid fish radiation. Mol Biol Evol. 29(1):195–206. [DOI] [PubMed] [Google Scholar]
- Gill F, Donsker D, Rasmussen P. 2020. Ioc world bird list (v10. 1). IOC World Bird List [consultado el 26 de julio de 2020] doi:https://doi.org/1014344/IOC ML. 10.
- Grant PR, Grant BR, Petren K. 2005. Hybridization in the recent past. Am Nat. 166(1):56–67. [DOI] [PubMed] [Google Scholar]
- Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y. 2010. A draft sequence of the neandertal genome. Science. 328(5979):710–722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gross JB, Borowsky R, Tabin CJ. 2009. A novel role for Mc1r in the parallel evolution of depigmentation in independent populations of the cavefish Astyanax mexicanus. PLoS Genet. 5(1):e1000326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haffer J. 1977. Secondary contact zones of birds in northern Iran. Bonn Zool Monogr. 10:1–64. [Google Scholar]
- Han F, Lamichhaney S, Grant BR, Grant PR, Andersson L, Webster MT. 2017. Gene flow, ancient polymorphism, and ecological adaptation shape the genomic landscape of divergence among Darwin’s finches. Genome Res. 27(6):1004–1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He C, Liang D, Zhang P. 2020. Asymmetric distribution of gene trees can arise under purifying selection if differences in population size exist. Mol Biol Evol. 37(3):881–892. [DOI] [PubMed] [Google Scholar]
- Hedrick PW. 2013. Adaptive introgression in animals: examples and comparison to new mutation and standing variation as sources of adaptive variation. Mol Ecol. 22(18):4606–4618. [DOI] [PubMed] [Google Scholar]
- Hermisson J, Pennings PS. 2005. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics. 169(4):2335–2352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hibbins MS, Hahn MW. 2022. Phylogenomic approaches to detecting and characterizing introgression. Genetics. 220(2):iyab173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoang DT, Chernomor O, Von Haeseler A, Minh BQ, Vinh LS. 2018. Ufboot2: improving the ultrafast bootstrap approximation. Mol Biol Evol. 35(2):518–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoballah ME, Gübitz T, Stuurman J, Broger L, Barone M, Mandel T, Dell’Olivo A, Arnold M, Kuhlemeier C. 2007. Single gene–mediated shift in pollinator attraction in petunia. Plant Cell. 19(3):779–790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hobolth A, Dutheil JY, Hawks J, Schierup MH, Mailund T. 2011. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21(3):349–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SY, Faircloth BC, Nabholz B, Howard JT. 2014. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 346(6215):1320–1331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J, Swofford R, Pirun M, Zody MC, White S. 2012. The genomic basis of adaptive evolution in threespine sticklebacks. Nature. 484(7392):55–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Junier T, Zdobnov EM. 2010. The newick utilities: high-throughput phylogenetic tree processing in the unix shell. Bioinformatics. 26(13):1669–1670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kallal RJ, Kulkarni SS, Dimitrov D, Benavides LR, Arnedo MA, Giribet G, Hormiga G. 2021. Converging on the orb: denser taxon sampling elucidates spider phylogeny and new analytical methods support repeated evolution of the orb web. Cladistics. 37(3):298–316. [DOI] [PubMed] [Google Scholar]
- Katoh K, Standley DM. 2013. Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 30(4):772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Konečná V, Bray S, Vlček J, Bohutínská M, Požárová D, Choudhury RR, Bollmann-Giolai A, Flis P, Salt DE, Parisod C. 2021. Parallel adaptation in autopolyploid Arabidopsis arenosa is dominated by repeated recruitment of shared alleles. Nat Commun. 12(1):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korunes KL, Samuk K. 2021. Pixy: unbiased estimation of nucleotide diversity and divergence in the presence of missing data. Mol Ecol Resour. 21(4):1359–1368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. Mega X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 35(6):1547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamers RP, Muthukrishnan G, Castoe TA, Tafur S, Cole AM, Parkinson CL. 2012. Phylogenetic relationships among staphylococcus species and refinement of cluster groups based on multilocus data. BMC Evol Biol. 12(1):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamichhaney S, Han F, Webster MT, Andersson L, Grant BR, Grant PR. 2018. Rapid hybrid speciation in Darwin’s finches. Science. 359(6372):224–228. [DOI] [PubMed] [Google Scholar]
- Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B. 2017. Partitionfinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Mol Biol Evol. 34(3):772–773. [DOI] [PubMed] [Google Scholar]
- Larget BR, Kotha SK, Dewey CN, Ané C. 2010. Bucky: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics. 26(22):2910–2911. [DOI] [PubMed] [Google Scholar]
- Larsson A. 2014. Aliview: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics. 30(22):3276–3278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lerner HR, Meyer M, James HF, Hofreiter M, Fleischer RC. 2011. Multilocus resolution of phylogeny and timescale in the extant adaptive radiation of Hawaiian honeycreepers. Curr Biol. 21(21):1838–1844. [DOI] [PubMed] [Google Scholar]
- Li H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27(21):2987–2993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:13033997. [Google Scholar]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. The sequence alignment/map format and samtools. Bioinformatics. 25(16):2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Linkem CW, Minin VN, Leaché AD. 2016. Detecting the anomaly zone in species trees and evidence for a misleading signal in higher-level skink phylogeny (Squamata: Scincidae). Syst Biol. 65(3):465–477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lutgen D, Ritter R, Olsen RA, Schielzeth H, Gruselius J, Ewels P, García JT, Shirihai H, Schweizer M, Suh A. 2020. Linked-read sequencing enables haplotype-resolved resequencing at population scale. Mol Ecol Resour. 20(5):1311–1322. [DOI] [PubMed] [Google Scholar]
- Maddison WP. 1997. Gene trees in species trees. Syst Biol. 46(3):523–536. [Google Scholar]
- Malinsky M, Matschiner M, Svardal H. 2021. Dsuite-fast D-statistics and related admixture evidence from VCF files. Mol Ecol Resour. 21(2):584–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malinsky M, Svardal H, Tyers AM, Miska EA, Genner MJ, Turner GF, Durbin R. 2018. Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nat Ecol Evol. 2(12):1940–1955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marques DA, Lucek K, Sousa VC, Excoffier L, Seehausen O. 2019a. Admixture between old lineages facilitated contemporary ecological speciation in lake constance stickleback. Nat Commun. 10(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marques DA, Meier JI, Seehausen O. 2019b. A combinatorial view on speciation and adaptive radiation. Trends Ecol Evol. 34(6):531–544. [DOI] [PubMed] [Google Scholar]
- Martin A, Orgogozo V. 2013. The loci of repeated evolution: a catalog of genetic hotspots of phenotypic variation. Evolution. 67(5):1235–1250. [DOI] [PubMed] [Google Scholar]
- Mayr E, Stresemann E. 1950. Polymorphism in the chat genus oenanthe (Aves). Evolution. 4:291–300. [Google Scholar]
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M. 2010. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9):1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meier JI, Marques DA, Mwaiko S, Wagner CE, Excoffier L, Seehausen O. 2017. Ancient hybridization fuels rapid cichlid fish adaptive radiations. Nat Commun. 8(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meier JI, Marques DA, Wagner CE, Excoffier L, Seehausen O. 2018. Genomics of parallel ecological speciation in lake Victoria cichlids. Mol Biol Evol. 35(6):1489–1506. [DOI] [PubMed] [Google Scholar]
- Minh BQ, Hahn MW, Lanfear R. 2020a. New methods to calculate concordance factors for phylogenomic datasets. Mol Biol Evol. 37(9):2727–2733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, Lanfear R. 2020b. Iq-tree 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 37(5):1530–1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Montejo-Kovacevich G, Meier JI, Bacquet CN, Warren IA, Chan YF, Kucka M, Salazar C, Rueda N, Montgomery SH, McMillan WO. 2021. Repeated genetic adaptation to high altitude in two tropical butterflies. bioRxiv. doi: 10.1101/2021.11.30.470630 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Natarajan C, Projecto-Garcia J, Moriyama H, Weber RE, Muñoz-Fuentes V, Green AJ, Kopuchian C, Tubaro PL, Alza L, Bulgarella M. 2015. Convergent evolution of hemoglobin function in high-altitude Andean waterfowl involves limited parallelism at the molecular sequence level. PLoS Genet. 11(12):e1005681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nater A, Burri R, Kawakami T, Smeds L, Ellegren H. 2015. Resolving evolutionary relationships in closely related species with whole-genome sequencing data. Syst Biol. 64(6):1000–1017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Outlaw RK, Voelker G, Bowie RC. 2010. Shall we chat? Evolutionary relationships in the genus cercomela (muscicapidae) and its relation to oenanthe reveals extensive polyphyly among chats distributed in Africa, India and the palearctic. Mol Phylogenet Evol. 55(1):284–292. [DOI] [PubMed] [Google Scholar]
- Panov E. 1992. Emergence of hybridogenous polymorphism in the Oenanthe picata complex. Bull Br Ornithol Club Centen Vol. 112:237–249. [Google Scholar]
- Panov EN. 2005. Wheatears of palearctic: Pensoft. [Google Scholar]
- Pardo-Diaz C, Salazar C, Baxter SW, Merot C, Figueiredo-Ready W, Joron M, McMillan WO, Jiggins CD. 2012. Adaptive introgression across species boundaries in heliconius butterflies. PLoS Genet. 8(6):e1002752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paterson RS, Rybczynski N, Kohno N, Maddin HC. 2020. A total evidence phylogenetic analysis of pinniped phylogeny and the possibility of parallel evolution within a monophyletic framework. Front Ecol Evol. 7:457. [Google Scholar]
- Pease JB, Haak DC, Hahn MW, Moyle LC. 2016. Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biol. 14(2):e1002379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peona V, Palacios-Gimenes O, Lutgen D, Olsen R, Alaei Kakhki N, Andriopoulos P, Bontzorlos V, Schweizer M, Suh A, Burri R. 2022. A chromosome-scale reference genome for eastern black-eared wheatear (Oenanthe melanoleuca). bioRxiv. doi: 10.1101/2022.12.22.521689 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poelstra JW, Vijay N, Bossu CM, Lantz H, Ryll B, Müller I, Baglione V, Unneberg P, Wikelski M, Grabherr MG. 2014. The genomic landscape underlying phenotypic integrity in the face of gene flow in crows. Science. 344(6190):1410–1414. [DOI] [PubMed] [Google Scholar]
- Preston JC, Hileman LC. 2009. Developmental genetics of floral symmetry evolution. Trends Plant Sci. 14(3):147–154. [DOI] [PubMed] [Google Scholar]
- Protas ME, Hersey C, Kochanek D, Zhou Y, Wilkens H, Jeffery WR, Zon LI, Borowsky R, Tabin CJ. 2006. Genetic analysis of cavefish reveals molecular convergence in the evolution of albinism. Nat Genet. 38(1):107–111. [DOI] [PubMed] [Google Scholar]
- Rambaut A, Drummond A. 2009. Tracer v1. 5.0. Available from: http://beast.Bio.Ed.Ac.Uk/tracer.
- Randler C, Förschler MI, Gonzalez J, Aliabadian M, Bairlein F, Wink M. 2012. Phylogeography, pre-zygotic isolation and taxonomic status in the endemic cyprus wheatear Oenanthe cypriaca. J Ornithol. 153(2):303–312. [Google Scholar]
- Roberts Kingman GA, Vyas DN, Jones FC, Brady SD, Chen HI, Reid K, Milhaven M, Bertino TS, Aguirre WE, Heins DC. 2021. Predicting future from past: the genomic basis of recurrent and rapid stickleback evolution. Sci Adv. 7(25):eabg5285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP. 2012. Mrbayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 61(3):539–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roy SW. 2009. Phylogenomics: gene duplication, unrecognized paralogy and outgroup choice. PLoS One. 4(2):e4568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salichos L, Stamatakis A, Rokas A. 2014. Novel information theory-based measures for quantifying incongruence among phylogenetic trees. Mol Biol Evol. 31(5):1261–1271. [DOI] [PubMed] [Google Scholar]
- Sangster G, Alström P, Forsmark E, Olsson U. 2010. Multi-locus phylogenetic analysis of old world chats and flycatchers reveals extensive paraphyly at family, subfamily and genus level (Aves: Muscicapidae). Mol Phylogenet Evol. 57(1):380–392. [DOI] [PubMed] [Google Scholar]
- Sayyari E, Mirarab S. 2018. Testing for polytomies in phylogenetic species trees using quartet frequencies. Genes. 9(3):132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sayyari E, Whitfield JB, Mirarab S. 2018. Discovista: Interpretable visualizations of gene tree discordance. Mol Phylogenet Evol. 122:110–115. [DOI] [PubMed] [Google Scholar]
- Schweizer M, Shirihai H. 2013. Phylogeny of the oenanthe lugens complex (Aves, Muscicapidae: Saxicolinae): paraphyly of a morphologically cohesive group within a recent radiation of open-habitat chats. Mol Phylogenet Evol. 69(3):450–461. [DOI] [PubMed] [Google Scholar]
- Schweizer M, Warmuth V, Alaei Kakhki N, Aliabadian M, Förschler M, Shirihai H, Suh A, Burri R. 2019a. Parallel plumage colour evolution and introgressive hybridization in wheatears. J Evol Biol. 32(1):100–110. [DOI] [PubMed] [Google Scholar]
- Schweizer M, Warmuth VM, Kakhki NA, Aliabadian M, Förschler M, Shirihai H, Ewels P, Gruselius J, Olsen R-A, Schielzeth H, et al. 2019b. Genome-wide evidence supports mitochondrial relationships and pervasive parallel phenotypic evolution in open-habitat chats. Mol Phylogenet Evol. 139:106568. [DOI] [PubMed] [Google Scholar]
- Seehausen O, Butlin RK, Keller I, Wagner CE, Boughman JW, Hohenlohe PA, Peichel CL, Saetre G-P, Bank C, Brännström Å. 2014. Genomics and the origin of species. Nat Rev Genet. 15(3):176–192. [DOI] [PubMed] [Google Scholar]
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31(19):3210–3212. [DOI] [PubMed] [Google Scholar]
- Smith SA, Moore MJ, Brown JW, Yang Y. 2015. Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants. BMC Evol Biol. 15(1):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song Y, Endepols S, Klemann N, Richter D, Matuschka F-R, Shih C-H, Nachman MW, Kohn MH. 2011. Adaptive introgression of anticoagulant rodent poison resistance by hybridization between old world mice. Curr Biol. 21(15):1296–1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stahl BA, Gross JB. 2015. Alterations in Mc1r gene expression are associated with regressive pigmentation in astyanax cavefish. Dev Genes Evol. 225(6):367–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stenz NW, Larget B, Baum DA, Ané C. 2015. Exploring tree-like and non-tree-like patterns using genome sequences: an example using the inbreeding plant species arabidopsis thaliana (l.) heynh. Syst Biol. 64(5):809–823. [DOI] [PubMed] [Google Scholar]
- Stern DL. 2013. The genetic causes of convergent evolution. Nat Rev Genet. 14(11):751–764. [DOI] [PubMed] [Google Scholar]
- Stryjewski KF, Sorenson MD. 2017. Mosaic genome evolution in a recent and rapid avian radiation. Nat Ecol Evol. 1(12):1912–1922. [DOI] [PubMed] [Google Scholar]
- Suh A. 2016. The phylogenomic forest of bird trees contains a hard polytomy at the root of neoaves. Zool Scr. 45:50–62. [Google Scholar]
- Suh A, Smeds L, Ellegren H. 2015. The dynamics of incomplete lineage sorting across the ancient adaptive radiation of neoavian birds. PLoS Biol. 13(8):e1002224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swofford D. 2003. Paup*. Phylogenetic analysis using parsimony (*and other methods). Sunderland: (MA: ): Version 4 Sinauer Associates. [Google Scholar]
- Tamura K, Stecher G, Kumar S. 2021. Mega11: molecular evolutionary genetics analysis version 11. Mol Biol Evol. 38(7):3022–3027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Than C, Ruths D, Nakhleh L. 2008. Phylonet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC bioinformatics. 9(1):1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Toews DP, Brelsford A. 2012. The biogeography of mitochondrial and nuclear discordance in animals. Mol Ecol. 21(16):3907–3930. [DOI] [PubMed] [Google Scholar]
- Van Belleghem SM, Vangestel C, De Wolf K, De Corte Z, Möst M, Rastas P, De Meester L, Hendrickx F. 2018. Evolution at two time frames: polymorphisms from an ancient singular divergence event fuel contemporary parallel evolution. PLoS Genet. 14(11):e1007796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Damme K, Cornetti L, Fields PD, Ebert D. 2022. Whole-genome phylogenetic reconstruction as a powerful tool to reveal homoplasy and ancient rapid radiation in waterflea evolution. Syst Biol. 71(4):777–787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vaughan TG. 2017. IcyTree: rapid browser-based visualization for phylogenetic trees and networks. Bioinformatics. 33(15):2392–2394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wallbank RW, Baxter SW, Pardo-Diaz C, Hanly JJ, Martin SH, Mallet J, Dasmahapatra KK, Salazar C, Joron M, Nadeau N. 2016. Evolutionary novelty in a butterfly wing pattern through enhancer shuffling. PLoS Biol. 14(1):e1002353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whittall JB, Voelckel C, Kliebenstein DJ, Hodges SA. 2006. Convergence, constraint and the role of gene expression during adaptive radiation: floral anthocyanins in aquilegia. Mol Ecol. 15(14):4645–4657. [DOI] [PubMed] [Google Scholar]
- Xie KT, Wang G, Thompson AC, Wucherpfennig JI, Reimchen TE, MacColl ADC, Schluter D, Bell MA, Vasquez KM, Kingsley DM. 2019. DNA fragility in the parallel evolution of pelvic reduction in stickleback fish. Science. 363(6422):81–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu Y, Dong J, Liu KJ, Nakhleh L. 2014. Maximum likelihood inference of reticulate evolutionary histories. Proc Natl Acad Sci U S A. 111(46):16448–16453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu Y, Nakhleh L. 2015. A maximum pseudo-likelihood approach for phylogenetic networks. BMC Genom. 16(10):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang C, Rabiee M, Sayyari E, Mirarab S. 2018. ASTRAL-III: Polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinform. 19(6):15–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zuccon D, Ericson PG. 2010. A multi-gene phylogeny disentangles the chat-flycatcher complex (Aves: Muscicapidae). Zool Scr. 39(3):213–224. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All sequencing data produced in the framework of this study are available on the ENA under project accession PRJEB58431.