Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2022 Dec 20;20(12):e3001914. doi: 10.1371/journal.pbio.3001914

Rapid and predictable genome evolution across three hybrid ant populations

Pierre Nouhaud 1,*, Simon H Martin 2, Beatriz Portinha 1,3, Vitor C Sousa 3, Jonna Kulmuni 1,4,*
Editor: Leonie C Moyle5
PMCID: PMC9767332  PMID: 36538502

Abstract

Hybridization is frequent in the wild but it is unclear when admixture events lead to predictable outcomes and if so, at what timescale. We show that selection led to correlated sorting of genetic variation rapidly after admixture in 3 hybrid Formica aquilonia × F. polyctena ant populations. Removal of ancestry from the species with the lowest effective population size happened in all populations, consistent with purging of deleterious load. This process was modulated by recombination rate variation and the density of functional sites. Moreover, haplotypes with signatures of positive selection in either species were more likely to fix in hybrids. These mechanisms led to mosaic genomes with comparable ancestry proportions. Our work demonstrates predictable evolution over short timescales after admixture in nature.


Hybridization between species is frequent in the wild but it is unclear when admixture events lead to predictable outcomes and if so, at what timescale. This study shows that selection contributes to correlated sorting of genetic variation across three independent hybrid wood ant populations in less than 50 generations.

Introduction

Hybridization is widespread and has shaped the genomes of many extant species, representing a major source of evolutionary novelties [13]. Understanding the evolution of hybrid genomes is important because it can shed light on how species barriers become established, on the fitness costs (e.g., incompatibilities) and benefits (e.g., heterosis) of hybridization, and help us better understand the function of genes and their interactions [4]. Variation in local ancestry patterns along hybrid genomes has been found across many taxa, including sunflowers [5], monkeyflowers [6], humans [7], swordtail fish [8,9], sparrows [10], butterflies [11,12], and maize [13]. Such variation in local ancestry reflects the interaction of recombination with neutral (e.g., drift, migration) and selective processes. Selection may lead to the fixation (adaptive introgression [1]) or purging of one ancestry component (incompatibilities; genetic load in one hybridizing species [14,15]). Recombination rate variation can modulate the effects of selection, for example, by enabling faster purging of deleterious alleles in low-recombining regions [14]. Admixture landscapes are also impacted by past demography and stochastic events, such as bottlenecks or initial admixture proportions [16]. These mechanisms can lead to the fixation or near-fixation of one ancestry component at a given locus within a hybrid population, a process we refer to as sorting of genetic variation (genome stabilization [5,12,17]). A few previous studies have investigated the interplay of different neutral and selective factors across multiple admixture events, revealing predictable sorting of ancestral variation in replicated hybrid populations [9,10,12]. However, while theory predicts that the efficiency of selection on introgressed variation will quickly decrease [7,15,18], the timescale of sorting after admixture in the wild is still unclear (but see [19]).

Here, we took advantage of multiple hybrid populations between the 2 wood ant species Formica aquilonia and F. polyctena to measure how rapid and predictable the evolution of admixed genomes is in the wild and identify the key factors that determine this predictability. These 2 species are polygynous, with up to several hundreds of queens per nest. A population is a supercolony, with dozens of interconnected nests and low relatedness between individuals [20]. Although differentiation between nests within a population is low in polygynous and supercolonial species, differentiation between populations is high, likely reflecting budding as the main dispersal mode (i.e., dispersing by foot and building a new nest in the vicinity of an already established nest, reviewed in [21]). Long-distance dispersal happens via temporary social parasitism, where a single mated queen (or possibly few) enters the nest of an unrelated species, executes the local queen, and uses the local workforce to raise her first brood [21]. Several hybrid F. aquilonia × polyctena populations with distinct mitochondrial sequences have been previously characterized in Southern Finland [22], providing a test case for the outcomes of admixture in nature.

Results and discussion

We generated whole-genome sequence data from 3 hybrid populations (Fig 1A, n = 39) and used genomes from both species collected within and outside their overlapping range (n = 10 per species [23]; mean coverage: 20.6×, S1 Table). Analyzing ca. 1.6 million single-nucleotide polymorphisms (SNPs) genome-wide, we confirmed that the hybrid populations were genetically intermediate between F. aquilonia and F. polyctena (Figs 1B and 1C and S1 and S1 Table). Both Bunkkeri and Pikkala individuals carried distinct F. aquilonia-like mitotypes (Fig 1D). F. polyctena-like mitotypes were observed in the Långholmen population (Fig 1D), where 2 hybrid lineages termed W and R coexist (S1 Fig, [23]). These lineages basically share a single mitotype and are possibly maintained through environment-dependent genetic incompatibilities and assortative mating [24,25]. The 3 hybrid populations have highly differentiated mitotypes (108 mutational steps between both F. aquilonia-like and F. polyctena-like clusters, Fig 1D) and nuclear DNA (mean pairwise FST estimates between hybrid populations ranging from 0.18 to 0.23) but low diversity of mitotypes within a population (≤2). These results are consistent with population bottlenecks during colony establishment coupled with little long-distance dispersal, as described above.

Fig 1. Young, independently evolving hybrid wood ant populations between F. aquilonia and F. polyctena in Southern Finland.

Fig 1

(A) Sampling sites across Europe (eCH: East Switzerland, wCH: West Switzerland, map base layer from Natural Earth: https://www.naturalearthdata.com/downloads/110m-physical-vectors). (B) Principal component analysis of 46,886 nuclear SNPs (5 kbp-thinned, MAC ≥ 2). PC1 discriminates between both species and PC2 between hybrid populations. Colors and symbols as in panel A. (C) sNMF estimation of individual ancestry coefficients for K = 2 and K = 6 populations (cross-entropy criterion gives K = 6 as the best K, see S1 Table for detailed admixture proportions computed using sNMFK = 2, LOTER and naive chromosome painting outputs). (D) Haplotype network derived from 199 mitochondrial SNPs. Circles represent haplotypes, with sizes proportional to their. Dashes indicate the number of mutational steps, with numbers ≥5 provided. The black arrow indicates 2 Finnish F. polyctena individuals carrying F. aquilonia-like mitotypes. (E) Admixture history between F. aquilonia and F. polyctena inferred through an SFS-based approach. The question marks represent the uncertainty associated with the admixture model: parameter estimates are comparable under both single origin and independent origins scenarios, but single origin models support separation of hybrid populations after brief periods of shared ancestry. Ne: average effective population size in number of haploids, m: migration rate. (F) Results of the model choice analysis performed with FASTSIMCOAL2 for each population pair. SO: single origin scenario; IO: independent origins scenario; IOm: independent origins scenario with migration between hybrid populations after admixture. The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3. MAC, minor allele count; SFS, site-frequency spectrum; SNP, single-nucleotide polymorphism.

To elucidate the ancestry of the hybrid populations and date admixture events, we reconstructed the demographic histories of pairs of hybrid populations using a coalescent approach based on the site-frequency spectrum (SFS) of nuclear SNPs (FASTSIMCOAL2 [26]). Coalescent analyses support balanced admixture proportions between species (i.e., no apparent minor ancestry; Fig 1E and S2S11 Tables), with comparable parameter estimates under scenarios assuming a single origin (SO, 1 admixture event) or independent origins (IO, multiple admixture events, S7S11 Tables). Consistent with field observations, assuming hybridization events occurred over the last 50 generations led to higher likelihoods (compared to older admixture events), with admixture time estimates ranging from 14 to 47 generations ago (S2S11 Tables). Model choice provided more support towards an IO scenario for the Bunkkeri—LångholmenW pair (median relative likelihoods: LIO = 0.99, LSO = 0.01; Fig 1F) and Bunkkeri—Pikkala (LIO = 0.88, LSO = 0.10; Fig 1F) and towards an SO scenario for the LångholmenW—LångholmenR pair (LIO = 0, LSO = 1.00; Fig 1F). Results were inconclusive for the remaining pair (Pikkala—LångholmenW: LIO = 0.43, LSO = 0.37; Fig 1F). Parameter estimates from models that assume SO indicate that even if the hybrid populations originate from a single admixture event, they mostly evolved independently (on average 9.5 generations of shared ancestry since admixture, S7S11 Tables). Following the admixture events, no significant gene flow was inferred either between hybrid populations or between hybrids and both species (LIOm < 0.12 in all comparisons, S7S11 Tables and Fig 1F). These demographic reconstruction results and patterns of mitochondrial variation are consistent with 2 alternative scenarios for the origin of these hybrid populations. Either they arose through independent hybridization events or an ancestral hybrid population combining several matrilines from both species was established and split into 3 locations, spanning 60 km within <50 generations. Considering wood ant reproductive and dispersal biology, we suggest independent admixture events (IO) as a more likely scenario, in line with model choice supporting IO for 2 population comparisons. However, we acknowledge that reconstructing very recent events accurately is challenging, and we next evaluate our results in the light of both IO and SO scenarios.

To investigate how evolution has shaped hybrid genomes after admixture, we mapped ancestry components along chromosomes independently for each hybrid population. To do this, we inferred local ancestry at 1.5 million phased SNPs using LOTER (Fig 2A [27]) and quantified tree topology weights in 100-SNP windows with TWISST (Fig 2B [28]). Hybrid populations have strongly correlated admixture landscapes along the genome (i.e., local ancestry in 1 population predicts the local ancestry in another population, Figs 2D and S3, Spearman’s rank correlation coefficients computed from TWISST weights ranging from 0.51 to 0.62, P < 10−15 for all population pairs). To test whether such predictability would be expected under neutrality, we used MSPRIME [29] to simulate neutral admixture events following both SO and IO scenarios for each hybrid population pair, using demographic parameters inferred with FASTSIMCOAL2 for the same pair. In all instances (4 population pairs × 2 admixture scenarios), neutral simulations led to balanced contributions of both ancestry components along the genome, but did not capture the clear deviations towards either ancestry component observed locally in the genome (i.e., sorting) with our empirical data (Fig 2C–2E, two-sample Kolmogorov–Smirnov tests, P < 10−15 for all populations, S3 and S4 Figs and S12 and S13 Tables). Thus, the extent of correlated sorting among hybrid populations cannot be explained solely by neutral processes, including the admixture scenario (SO versus IO) and/or demographic history. Moreover, both SFS-based demographic modeling (Fig 1F) and the lack of mitochondrial haplotype sharing between hybrid populations (Fig 1D) argue against gene flow as a potential source for the parallelism observed. As such, other mechanisms must be invoked to explain the rapid evolution of sorted and correlated admixture landscapes in the different hybrid populations.

Fig 2. Sorting of genetic variation is more correlated than expected under neutrality across 3 hybrid wood ant populations.

Fig 2

Examples of (A) local ancestry and (B) topology weighting patterns (green: hybrids locally related to F. aquilonia, yellow: hybrids locally related to F. polyctena) inferred independently in each population along 1 pseudo-chromosome (Scaffold 13). (C) Excess of extreme topology weightings in observed compared to simulated data (IO, SO) in all populations. Empirical and simulated distributions compared with two-sample Kolmogorov–Smirnov tests (D, test statistic and P, P-value). (D) Genome-wide comparison of topology weighting differences between each population pair (ΔWEIGHT.: F. aquilonia weighting minus F. polyctena weighting, computed per population over 14,890 100-SNP windows, gray circles). The regression line is indicated in white. ρ, Spearman’s correlation coefficient and P, P-value of the Spearman’s correlation test. (E) A larger fraction of the genome is sorted in observed compared to simulated data in all populations (sorting measured as the absolute F. aquilonia or F. polyctena weighting, see S3 Fig for detailed results per pairwise comparison). The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3. IO, independent origins; SO, single origin.

We hypothesized that correlated sorting of genetic variation in hybrid populations is caused by selection against deleterious alleles that have accumulated in the hybridizing species with the lowest effective population size (Ne [7,8,30,31]). This effect is expected to be stronger in gene-dense regions [8,32] but also in low-recombining regions [14], where tighter linkage between deleterious alleles, and between neutral and deleterious alleles, leads to removal of larger tracts of ancestry. The 2 Formica species were estimated to have contrasting effective population sizes, with a ca. 30% lower Ne in F. polyctena compared to F. aquilonia in the last 200,000 generations ([23], Fig 1E). In hybrid populations, sorting (hereafter ≥90% of either ancestry component inferred from LOTER at a given locus) was faster in low-recombining regions of the genome, as well as in gene-rich regions (Figs 3A and S5). Moreover, in low-recombining regions, the F. aquilonia ancestry was preferentially fixed in all populations (Fig 3A). Focusing on coding SNPs, we found a significant enrichment for F. aquilonia ancestry genome-wide in all populations, consistent with the hypothesis that hybrid populations have purged the deleterious load accumulated in F. polyctena due to its smaller Ne (genomic permutations, P < 0.002 in all populations, Fig 3B). These results support the contributions of both recombination rate variation and genetic load in promoting sorting of ancestral variation in hybrids, as previously characterized in other study systems (reviewed in [33]). Our study also reveals that consistent sorting of ancestral variation can happen in less than 50 generations in small populations (Fig 1E).

Fig 3. Sorting of ancestral polymorphism in hybrids is driven by recombination rate variation and genetic load.

Fig 3

(A) Heatmap showing the fraction of sorted 20 kbp windows and the direction of sorting (ancestry component fixed) as a function of recombination rate and gene density quantiles in each hybrid population. (B) Coding regions are significantly enriched for the F. aquilonia ancestry component in all hybrid populations (P < 0.002). For each population (panel, same as A) is plotted local ancestry (y-axis, 0: F. aquilonia allele fixed, 1: F. polyctena allele fixed) as a function of the fraction of SNPs within CDS (x-axis). Confidence intervals (in gray) were obtained using 500 genomic permutations (white line: median of the permutation approach). The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3. CDS, coding sequence; SNP, single-nucleotide polymorphism.

Positive selection could contribute to the correlated sorting of genetic variation across hybrid populations: advantageous alleles from either species could repeatedly sweep in distinct hybrid populations after admixture. Under this scenario, a genomic region fixed for the F. aquilonia ancestry component in hybrids would display signatures of selection in F. aquilonia individuals. To test this hypothesis, we looked for selective sweep signatures in both hybridizing species with RAiSD [34], which quantifies changes in the SFS, levels of linkage disequilibrium and genetic diversity along the genome through the composite sweep statistic μ (Fig 4). Consistently sorted genomic windows (i.e., windows fixed for either species ancestry across all hybrid populations, 1.92% of the windows overall) displayed significantly higher sweep statistics only in the species from which the ancestry component was fixed in hybrids (genomic permutations, P < 0.001, Fig 4B). While recombination rate estimates were significantly lower than the rest of the genome in these consistently sorted windows (Wilcoxon test, W = 1,740,660, P < 10−15), purging of load in low-recombining regions cannot explain the observation that hybrids have fixed ancestry from the species where a sweep may have occurred prior to hybridization (Fig 4B).

Fig 4. Signatures of selective sweeps in hybridizing species predict the direction of sorting in admixed genomes.

Fig 4

(A) Distribution of selective sweep statistics (μ) computed over 20 kbp windows in F. aquilonia (left) and F. polyctena (right). Genome-wide μ distribution (gray) and observed values in windows fixed for either F. aquilonia (n = 104 windows, green) or F. polyctena ancestry components (n = 98, yellow) in all hybrid populations. (B) Windows fixed for either F. aquilonia (left) or F. polyctena (right) ancestry components in all hybrid populations are significantly enriched for high μ values in F. aquilonia or F. polyctena individuals, respectively. This suggests that a haplotype with a signature of positive selection in either species is more likely to fix in hybrids. Simulated μ values were obtained through 1,000 genomic permutations (as in [64]). Each circle represents medians computed over all consistently sorted windows (solid: observed, open: simulated). The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

Hybrid genomes provide powerful insights into evolution because they are exposed to strong, and often opposing, selective forces [33]. In this study, we coupled reconstruction of admixture histories, local ancestry inference, and coalescent simulations to show that the sorting of ancestral variation is predictable and, in some instances, likely independent across several natural hybrid ant populations. Some predictability has been previously characterized in other systems (e.g., [9]), and introgression is for example limited on sex chromosomes compared to autosomes in replicated hybrid populations of both Italian sparrows [10] and Lycaeides butterflies [12]. We also documented that the known interplay between negative selection and recombination rate variation contributes to remarkable correlation of ancestry components along the genome between hybrid wood ant populations (Spearman’s rank correlation coefficients using TWISST ranging from 0.51 to 0.62 in hybrid wood ants, Fig 2D). Since ancestry proportions are balanced in hybrid wood ants, negative selection should not target any minor ancestry component, as assumed under unbalanced ancestry proportions and when species barriers are highly polygenic (e.g., [8]). Instead, our results suggest that negative selection is impacting ancestry from the species with the smaller effective population size, and presumably a higher load of deleterious alleles. Distinguishing signatures of incompatibilities from those of genetic load and their possible interplay remains a challenge for future studies.

We also showed, to our knowledge for the first time, that events of positive selection prior to admixture likely contribute to the predictability of admixture outcomes (see [35] for a theoretical treatment): Genomic regions displaying signatures of selective sweeps in 1 hybridizing species tend to fix the same ancestry component in hybrid populations. These genomic regions could also act as incompatibilities, which we cannot identify on the sole basis of our data, but which impact the landscape of introgression in hybrids [9], as previously documented in our study system [23,36]. In the future, novel methodological developments [37,38] coupled with larger sample sizes may allow identifying candidate incompatibilities in hybrid wood ants.

Finally, in contrast to other recent studies of hybrid genome evolution, the ant hybrids still show balanced ancestry contributions after ca. 50 generations since admixture. Fluctuating, environment-dependent selection could be one mechanism maintaining both ancestry components in hybrids, as microsatellite allele frequencies of the cold-adapted F. aquilonia species have been shown to positively correlate with yearly temperature over a 16-year time period in one of the hybrid populations we studied [25]. As this correlation was stronger in males, haplodiploidy is another mechanism that may contribute to the maintenance of genetic variation in wood ants. To conclude, we have shown that the sorting of ancestral genetic variation in hybrid genomes can occur rapidly and predictably after admixture due to both positive and purifying selection.

Methods

Sampling

F. aquilonia, F. polyctena, and their Finnish hybrids are polygynous: within a nest, the reproductive effort is shared across dozens or hundreds of egg-laying queens. These 2 species and their Finnish hybrids are also supercolonial, with populations (i.e., supercolonies) formed by the association of several cooperating nests within a site. Polygyny and supercoloniality both result in low relatedness among individuals sampled within a given population [39]. We sampled hybrid individuals from 3 populations previously mapped in Southern Finland: Långholmen (composed of 2 hybrid lineages R & W; [23]), Bunkkeri and Pikkala [22]. We collected unmated queens from Bunkkeri (n = 10) and Långholmen (nW = 10, nR = 9) in Spring 2018 and workers from Pikkala (n = 10) in Spring 2015 (S1 Table). Data generated by Portinha and colleagues [40] were used as F. aquilonia and F. polyctena reference panels. Briefly, it consists of 10 workers (diploid females) per species sampled from several monospecific colonies across Europe (Fig 1A and S1 Table). F. polyctena samples were collected at 2 locations in Switzerland (East, n = 3; and West, n = 3), the Åland islands (Finland, n = 3) and Southern Finland (n = 1), and F. aquilonia samples were collected in Scotland (UK, n = 3), East Switzerland (n = 3), Central Finland (n = 3), and Southern Finland (n = 1). Both species are sympatric in Southern Finland, while they can be considered allopatric in other sampling locations (they are found at different altitudes in East Switzerland [40]).

DNA extraction

Both the hybrid samples generated for this study and the F. aquilonia and F. polyctena samples from Portinha and colleagues [40] were processed and sequenced at the same time, and all data went through the same pipeline. DNA was extracted with a sodium dodecyl sulfate (SDS) protocol from whole bodies, and sequencing libraries were built with NEBNext DNA Library Prep Kits (New England Biolabs) by Novogene (Hong Kong).

DNA sequencing and read mapping

Unless stated otherwise, all software was used with default parameters. Whole-genome sequencing was carried out on Illumina Novaseq 6000 (150 base pairs, paired-end reads), targeting 15× per individual (S1 Table). We trimmed raw Illumina reads and adapter sequences with TRIMMOMATIC v0.38 and mapped trimmed reads against the F. aquilonia × F. polyctena reference genome [41] using BWA MEM v0.7.17 [41]. We then removed duplicates using PICARD TOOLS v2.21.4 (http://broadinstitute.github.io/picard). All bioinformatic scripts are available from https://github.com/pi3rrr3/antmixture.

SNP calling and filtering

We called SNPs jointly across all samples with FREEBAYES v1.3.1 (population priors disabled with -k option [42]) and normalized the resulting VCF file (parsimonious left-alignment of multi-nucleotide variants) using VT v0.5 [43]. We excluded both sites located at less than 2 base pairs from indels and sites supported by only forward or reverse reads using BCFTOOLS v1.10 [44]. We decomposed multi-nucleotide variants using vcfallelicprimitives from VCFLIB v1.0.1. The next steps were carried out with BCFTOOLS. Biallelic SNPs with quality equal or higher than 30 were kept. Individual genotypes with (i) genotype qualities lower than 30 and/or (ii) with depth of coverage lower than 8 were coded as missing data. Sites displaying more than 50% missing data over all samples were discarded. Genotyping errors due to, e.g., misaligned reads were removed using a filter based on excessive heterozygosity. To do so, we used an approach similar to Pfeifer and colleagues [45] and pooled all samples together, after which we excluded sites displaying heterozygote excess (P < 0.01,—hardy command from VCFTOOLS v0.1.16 [46]). Since putative genetic incompatibilities should not be heterozygous in hybridizing F. aquilonia and F. polyctena individuals, they should not be impacted by this filtering step. Overall, 122,044 sites were removed, half of them located on unanchored, repeat-rich contigs and displaying heterozygosity ≥0.48 in all populations. We then filtered sites based on individual sequencing depth distributions at SNP loci, setting as missing sites where depth was lower than half or higher than twice the mean value of the individual considered. Finally, sites with more than 15% missing data over all samples were discarded. These steps led to a final dataset of 1,659,532 SNPs across 59 individuals.

Population structure

Population structure was assessed using a reduced dataset of 46,896 SNPs obtained after thinning (retaining 1 SNP every 5 kbp) and discarding sites with minor allele count (MAC) <2. We performed a principal component analysis (PCA) with PLINK v1.9 [47] and sNMF clustering using the LEA package v3.0.0 [48] in R v3.6.2 [49]. Clustering was carried for a number of ancestral components (K) ranging from 1 to 10, with 10 iterations performed per K value. The lowest cross-entropy was obtained with K = 6, and the results of runs with the lowest cross-entropy for both K = 2 and K = 6 are shown in Fig 1C.

Mitotype network

Mitochondrial SNPs were called separately with FREEBAYES using a frequency-based approach (--pooled-continuous option). SNP filtering was carried using the same pipeline as for the nuclear genome, which led to the identification of 199 biallelic SNPs across 59 individuals. Individual FASTA files were written using vcf-consensus and aligned with MAFFT v7.429 [50]. The median-joining network was created using POPART [51].

Demographic modeling

Before reconstructing admixture histories, we removed the 122,479 SNPs located on the third chromosome. This chromosome carries a supergene controlling whether Formica colonies are headed by 1 or multiple queens (social chromosome [52]). Recombination reduction between the 2 supergene variants leads to the maintenance of ancestral polymorphisms across Formica species that could bias our demographic inference. The dataset used for demographic modeling hence comprises 1,537,053 SNPs.

We used the composite-likelihood method implemented in FASTSIMCOAL2 v2.6 [26] to compare alternative demographic models to demographic parameters inferred from the SFS following Portinha and colleagues [40]. We ran each model 100 times with 80 iterations per run for likelihood maximization and the expected SFS was approximated through 200,000 coalescent simulations per iteration. We assumed a mutation rate of 3.5 × 10−9 per bp per haploid genome per generation, which is an average based on estimates currently available for social insects [53]. No population growth was allowed, but population sizes could change when migration rates changed. Generation time was assumed to be 2.5 years [40]. Finally, we used the speciation history inferred by Portinha and colleagues [40] to constrain parameter range prior to the admixture event(s) in our demographic models. This speciation history was inferred from Finnish F. aquilonia and F. polyctena individuals including those used in the present study. All parameter ranges are indicated in S2 Table (three-population models) and S7 Table (four-population models).

Field observations suggest hybrid populations may have arisen through recent admixture (ca. 50 years ago). Constraining admixture under 50 generations (ca. 125 years ago) led to models with higher expected likelihoods for all hybrid populations, and both constrained and unconstrained results are shown in S3S6 Tables.

SFS characteristics

We built folded SFSs using minor allele frequency (MAF) and downsampled genotypes to minimize missing data, using R scripts available at https://github.com/vsousa/EG_cE3c/tree/master/CustomScripts/Fastsimcoal_Example_Bootstrap/Scripts_VCFtoSFS. We first determined a minimum sample size across all sites (number of individuals available for resampling minus maximum number of missing data per site). We then resampled individuals in 50 kbp windows and discarded blocks where the mean distance between 2 consecutive SNPs within a block was <2 bp. We estimated the number of monomorphic sites from the proportion of polymorphic sites and the total number of callable sites. The latter was obtained from each individual BAM file using MOSDEPTH v0.2.9 [54] and individual sequencing depth thresholds defined for SNP calling.

Distinct 3D- and 4D-SFSs were built for our three- and four-population models, respectively, to answer specific study questions (see below). In both cases, we used the single individual from each species sampled in Southern Finland as representative of their respective species. For each hybrid population, we resampled 4 individuals every window to build the SFSs. The 3D-SFSs contained information of both species and 1 focal hybrid population, while the 4D-SFSs included information of both species and 2 focal hybrid populations. For these latter models, we analyzed 4 different pairwise combinations: Bunkkeri—LångholmenW, Pikkala—LångholmenW, Bunkkeri–Pikkala, and finally LångholmenW—LångholmenR.

Disentangling between secondary contact and hybridization: Three-population models

For each hybrid population, we tested 3 different scenarios that could lead to present-day admixed individuals. The first scenario is hybridization, namely an admixture event between F. polyctena and F. aquilonia where one species would contribute a genetic input of α into the hybrid population, while the other species would contribute the remaining fraction 1 - α. This scenario was assessed both with and without gene flow between hybrids and either (or both) species after admixture (forward in time).

The second scenario is secondary contact, where after the speciation event hybrid ancestors would first diverge from one species, and then receive (haploid) migrants from the other species. This scenario was tested in both directions (i.e., assuming both ((F. polyctena, hybrid), F. aquilonia) and ((F. aquilonia, hybrid), F. polyctena) topologies). Gene flow was also allowed between both species before and after the split between the hybrid population and the first species.

The third and last scenario is a trifurcation model, where the 2 species and the hybrid ancestral population would first diverge simultaneously, after which all populations would exchange migrants at different rates. Higher migration from both species into the hybrid ancestral population would lead to admixed individuals in the current-day hybrid population.

Disentangling between single and independent origins of hybrid populations: Four-population models

Hybrid populations that arose through a single admixture event followed by a long period of shared ancestry would have more correlated sorting of genetic variation than if they separated soon after the admixture event or arose through independent admixture events. To disentangle between these scenarios, we tested 2 alternative admixture models using 2 hybrid populations at a time. Based on the results of our three-population models, admixture times were constrained to the last 50 generations in all subsequent models.

The first model is an SO scenario where F. polyctena contributes a proportion α of the genetic material of the ancestral hybrid population, with F. aquilonia providing the complementary 1 - α fraction. This ancestral hybrid population then diverges into 2 hybrid populations.

The second model is an IO scenario where each hybrid population arises through a distinct admixture event, with possibly different contributions from both species (i.e., F. polyctena contributes a fraction α for the first hybrid population and β for the second hybrid population, and F. aquilonia 1 - α and 1 - β, respectively). As post-admixture gene flow between hybrid populations could also lead to correlated sorting of genetic variation, we additionally tested an IO with migration (IOm) scenario that includes reciprocal migration between 2 hybrid populations after the most recent admixture event.

Model choice

We performed model choice using relative likelihoods computed based on Akaike information criterion (AIC) to disentangle between SO, IO, and IOm scenarios for each hybrid population pair, following Excoffier and colleagues [55]. To minimize the impact of linkage, observed SFSs were built as described previously for parameter estimation, but using a pruned dataset, keeping every 100th SNP (18,378 SNPs in total) after filtering sites where at least 4 genotypes were available per hybrid population. We computed the AIC for 100 bootstrap replicates, resampling individuals for each SNP, using R scripts available at https://github.com/vsousa/EG_cE3c/tree/master/CustomScripts/Fastsimcoal_Example_Bootstrap/Scripts_VCFtoSFS. For each bootstrap replicate, likelihoods were computed based on average expected SFSs simulated 100 times using the maximum-likelihood estimates of each model, with 200,000 coalescent simulations run per replicate. Both IO and SO models had 8 parameters, while the IOm model had 10.

Population recombination rate estimation

We used iSMC v0.0.23 [56] to estimate population recombination rates along the genome. We hypothesized that the recombination landscape in hybrids would be an average of the recombination landscapes in both species. Using all non-Finnish individuals from both species jointly, we fitted a model of coalescence with recombination including 40 TMRCA (time to the most recent common ancestor) intervals and 10 ρ categories. Population recombination rate estimates were collected over non-overlapping 20 kbp windows, discarding windows with less than 20 SNPs.

Haplotype estimation

Prior to mapping ancestry components, we phased all SNPs anchored on scaffolds (78.2% of the genome) with WHATSHAP v1.0 [57], which uses short-range information contained within paired-end reads. We then performed statistical phasing and imputation with SHAPEIT v4.1.2 [58], using the sequencing data setting and increasing the MCMC iteration scheme as indicated in SHAPEIT documentation. The total phased dataset contained 1,490,364 SNPs.

Outgroup information

We used F. exsecta as an outgroup to root topologies inferred with TWISST (see below). This species belongs to a distinct species group (F. exsecta group) while F. aquilonia and F. polyctena both belong to the F. rufa group [59]. To extract F. exsecta genotypes at our phased SNP loci, we mapped data previously generated by Dhaygude and colleagues [60] against the same reference genome used in the present study. These data consist of Illumina paired-end, 2 × 100 bp reads generated from a pool of 50 (haploid) males (9.89 Gbp overall, median insert size: 469 bp, ENA accession SAMN07344806). Reads were trimmed with TRIMMOMATIC using the same parameters as before, mapped with BWA MEM, and deduplicated with PICARD TOOLS. We filtered proper read pairs with mapping quality ≥20 (86.4% of the reads, sequencing depth 14.5×) using SAMTOOLS v1.13 [61]. We generated a pileup file (disabling per-base alignment quality computation and filtering minimum base quality ≥20) and finally extracted F. exsecta genotypes at each locus of the phased dataset using a custom R script.

Mapping ancestry components along the genome

We used 3 different methods to map ancestry variations along the genome, which all relied on reference panels comprising both F. aquilonia and F. polyctena individuals. As gene flow from F. aquilonia to F. polyctena in Finland prior to admixture [40] could bias our results, we ran all 3 methods after excluding the Finnish representatives of both species from our reference panels (S1 Table).

We performed local ancestry inference from phased data for each population independently using LOTER v1.0 [27], after which we averaged local ancestries over the same non-overlapping 20 kbp windows used for population recombination rate estimation.

We also recorded topology weightings in 100-SNP windows along the genome with TWISST v0.2 [28] using phased data. Analyzing jointly the 2 species, the outgroup F. exsecta and 1 hybrid population at a time, the 3 rooted topologies are (Fig 2B):

  1. (((F. aquilonia, hybrid), F. polyctena), outgroup),

  2. (((F. polyctena, hybrid), F. aquilonia), outgroup),

  3. (((F. aquilonia, F. polyctena), hybrid), outgroup).

Since the average weighting of the third topology was around 10% genome-wide in all hybrid populations, we measured topology weighting differences per window by subtracting the weighting of the second topology (F. polyctena topology) to the weighting of the first topology (F. aquilonia topology, Fig 2C and 2D). The resulting metric, ΔWEIGHT., ranges from −1 to +1 (in a given window, only the F. polyctena or the F. aquilonia topologies are inferred, respectively). When ΔWEIGHT. is close to zero, both topologies have similar weightings, which we interpreted as a lack of sorting of ancestral variation in hybrids.

Finally, we also performed a “naive” chromosome painting approach from non-phased genotypes. To do so, we first discarded sites with more than 2 missing genotypes over all reference individuals, after which we identified 79,336 ancestry-informative SNPs displaying an allele frequency difference above 80% between both species.

Coalescent simulations

We used coalescent simulations to measure sorting levels that would be expected in each hybrid population under the reconstructed admixture history. For each population pair used for demographic reconstruction, we simulated both IO and SO scenarios (4 population pairs × 2 scenarios). We used the parameters inferred under each scenario for each population pair with FASTSIMCOAL2 (parameters: divergence and admixture times, Ne estimates, size changes, migration rates, and admixture proportions) and ran simulations in MSPRIME v1.0.2 [29], modeling 100 non-recombining 10 kbp blocks with a recombination rate of 10−6 within blocks. Each simulation was run 100 times, and VCF files were produced via MSPRIME assuming a mutation rate of 3.5 × 10−9 per bp per haploid genome per generation [53] (S12 and S13 Tables and S4 Fig). Topology weightings were computed directly from tree sequences using TWISST and ΔWEIGHT. distributions were obtained as stated above over all 100 replicates. MSPRIME scripts are available from https://github.com/pi3rrr3/antmixture.

Selective sweep detection

We looked for evidence for selective sweeps independently in both hybridizing species with RAiSD v2.9 [34] using the full dataset. The composite selective sweep statistics μ were estimated with default parameters using non-Finnish individuals from each species. Results were then averaged over 20 kbp non-overlapping windows in R (Fig 4A).

Genomic permutation approach

We tested statistical significance of both (i) the enrichment in F. aquilonia ancestry at coding SNPs (Fig 3B) and (ii) the association between the direction of sorting and the evidence for selective sweeps in 1 species (Fig 4B) using a similar shift-based, circular permutation scheme inspired by Yassin and colleagues [62].

For the first analysis, we slid the local ancestry landscape inferred by LOTER at the SNP level by 100 kbp increments (i.e., five 20 kbp windows at a time), maintaining the structure of ancestry blocks as observed in our data. For each shift replicate, we then computed the fraction of coding SNPs within each local ancestry bin genome-wide. For this analysis, we ran 500 permutations and P-values were defined as the proportion of shift replicates in which at least a similar fraction of coding SNPs was reached as in our empirical data. The 95th and 99th quantiles plotted in Fig 3B were computed over all 500 permutations.

For the second analysis, we slid the average local ancestry values computed over 20 kbp non-overlapping windows by 100 kbp increments, which also shifted the location of sorted windows. For each shift replicate, we then computed the median composite selective sweep statistic μ across all sorted windows genome-wide for each ancestry component independently. We ran 1,000 permutations and defined P-values as the proportion of shift replicates in which median μ values were at least as high as observed in the empirical data (Fig 4B).

Density of functional sites and gene content in sorted regions

The density of functional sites was computed in 20 kbp non-overlapping windows along the genome by measuring for each window the fraction of base pairs falling within coding sequences (CDS), which positions were extracted from the GFF annotation file [63].

The 202 sorted windows (defined as displaying ≥ 90% of the same ancestry component across all hybrid populations, see Fig 4) intersected with 364 gene models; however, no significant gene enrichment was detected with TOPGO v1.0 [64].

Supporting information

S1 Fig. Visualization of the first 5 principal components of the principal component analysis performed over 46,886 SNPs genome-wide (5 kb-thinned, MAC ≥ 2).

The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

(TIF)

S2 Fig. Comparison of ancestry mapping approaches.

For each hybrid population (columns) are shown TWISST ΔWEIGHT. statistics vs. LOTER local ancestry estimates (first row, 14,890 100-SNP windows), TWISST ΔWEIGHT. statistics vs. naive chromosome painting local ancestry estimates (PAINTING, second row, 5,529 windows with at least 5 ancestry-informative SNPs), and LOTER vs. naive chromosome painting local ancestry estimates (third row, 5,529 windows with at least 5 ancestry-informative SNPs). ΔWEIGHT. ranges between −1 if all topologies in the window group the hybrid population with F. aquilonia, to +1 if with F. polyctena. LOTER and naive chromosome painting are both SNP-based (results averaged over windows) and code ancestries as 0 for F. aquilonia and 1 for F. polyctena. In each panel, the regression line is indicated in white. ρ, Spearman’s correlation coefficient and P, P-value of the Spearman’s correlation test. The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

(TIF)

S3 Fig. Observed (OBS.) and simulated levels of local ancestry correlation (left, each point is a chromosome) and sorting (right) for each hybrid population pair (rows).

The degree of sorting is measured as the absolute F. aquilonia or F. polyctena weighting. IO: independent origins scenario, SO: single origin scenario (100 independent runs per scenario). The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

(TIF)

S4 Fig. Principal component analyses of observed and simulated datasets for each hybrid population pair (rows).

Observed PCAs were obtained as per Fig 1 (5 kb-thinned SNP data, minor allele count ≥2). Simulations were run with msprime using parameter estimates inferred under both single and independent origins scenarios with fastsimcoal2 and assuming a mutation rate of 3.5 × 10−9. One run was randomly picked per simulated scenario. The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

(TIF)

S5 Fig

Distribution of LOTER local ancestry estimates (x-axis, 0: fixed for F. aquilonia ancestry component, 1: fixed for F. polyctena ancestry component) across recombination rate (upper row) and gene density (lower row) quartiles in each hybrid population (columns), computed over 20 kbp non-overlapping windows. Medians are indicated with black dots. The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

(TIF)

S1 Table. Sample information, sequencing statistics, hybrid indices, and accession numbers for each sample analyzed in the study.

In the caste column, w: worker and q: young unmated queen. Principal component coordinates (PC1 and PC2) and F. aquilonia admixture proportions (sNMF clustering analysis run with K = 2, sNMF_K2_Faq) are both based on a 5 kbp-thinned dataset of 46,896 SNPs with MAC ≥ 2. HI_loter values are based on 1,490,364 phased SNPs, with non-Finnish species samples used as reference panels. HI_paint values are based on 79,336 ancestry-informative markers displaying allele frequency differences ≥ 80% between non-Finnish species samples.

(XLSX)

S2 Table. Demographic parameters estimated by fastsimcoal2 in demographic model analyses.

Unless bounded, the upper limit of the search range could be exceeded. Each model used only a subset of these parameters. The time of admixture parameter (TADMS) indicated with an asterisk (*) was first unconstrained, and then constrained to test for recent hybridization (<50 generations). Double asterisks (**) mark parameters that calculation changes between models. The alternative minimum and maximum bounds are displayed in the respective columns.

(XLSX)

S3 Table. Maximum likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in Pikkala (contained 348,228 sites).

All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,377,703.973. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

(XLSX)

S4 Table. Maximum-likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in Bunkkeri (contained 463,401 sites).

All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,821,814.189. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

(XLSX)

S5 Table. Maximum likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in LångholmenW (contained 289,948 sites).

All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,144,658.203. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

(XLSX)

S6 Table. Maximum-likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in LångholmenR (contained 223,927 sites).

All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −883,471.568. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

(XLSX)

S7 Table. Demographic parameters estimated by fastsimcoal2 in demographic model analyses.

Unless bounded, the upper limit of the search range could be exceeded. Each model used only a subset of these parameters.

(XLSX)

S8 Table. Maximum likelihood parameter estimates for the models concerning the history of the F. aquilonia × F. polyctena hybrid populations sampled in Bunkkeri and Pikkala (contained 328,913 sites).

All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,509,459.108. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

(XLSX)

S9 Table. Maximum likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in Bunkkeri and LångholmenW (contained 282,215 sites).

All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,291,308.181. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

(XLSX)

S10 Table. Maximum likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in Pikkala and LångholmenW (contained 218,545 sites).

All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,001,623.462. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

(XLSX)

S11 Table. Maximum-likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in LångholmenW and LångholmenR (contained 150,207 sites).

All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −693,262.061. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

(XLSX)

S12 Table. Observed and simulated average levels of diversity (π) for each population in each comparison.

Simulated estimates were generated with msprime using both models inferred with fastsimcoal2 (SO: single origin scenario, IO: independent origin scenario). Simulated estimates were averaged over 100 independent runs.

(XLSX)

S13 Table. Observed and simulated average levels of divergence (FST) between populations within each comparison.

Simulated estimates were generated with msprime using both models inferred with fastsimcoal2 (SO: single origin scenario, IO: independent origin scenario). Simulated estimates were averaged over 100 independent runs.

(XLSX)

Acknowledgments

We thank G. Barroso for assistance with iSMC, the SpecIAnt group for feedback, and CSC–IT Center for Science, Finland, for computational resources. This work was performed under the Global Ant Genomics Alliance.

Abbreviations

AIC

Akaike information criterion

CDS

coding sequence

IO

independent origins

MAC

minor allele count

MAF

minor allele frequency

PCA

principal component analysis

SDS

sodium dodecyl sulfate

SFS

site-frequency spectrum

SNP

single-nucleotide polymorphism

SO

single origin

TMRCA

time to the most recent common ancestor

Data Availability

All FASTQ files are available on ENA under project PRJEB55288 (hybrid samples) and PRJEB51899 (F. aquilonia & F. polyctena samples). VCF files, FASTSIMCOAL2 files and scripts, and statistics computed over genomic windows are available from figshare: https://doi.org/10.6084/m9.figshare.c.6140793.v3. Bioinformatic and MSPRIME scripts are available from https://github.com/pi3rrr3/antmixture.

Funding Statement

This work was supported by Academy of Finland (www.aka.fi) no. 328961 and HiLIFE (www2.helsinki.fi/en/helsinki-institute-of-life-science) grants to JK. SHM was supported by a Royal Society University Research Fellowship URF\R1\180682 (www.royalsociety.org). VCS was supported by Fundação Ciência e Tecnologia CEECINST/00032/2018/CP1523/CT0008 and UIDB/00329/2020 grants (www.fct.pt). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Oziolor EM, Reid NM, Yair S, Lee KM, Guberman VerPloeg S, Bruns PC, et al. Adaptive introgression enables evolutionary rescue from extreme environmental pollution. Science. 2019;364:455–457. doi: 10.1126/science.aav4155 [DOI] [PubMed] [Google Scholar]
  • 2.Suarez-Gonzalez A, Lexer C, Cronk QCB. Adaptive introgression: a plant perspective. Biol Lett. 2018;14:20170688. doi: 10.1098/rsbl.2017.0688 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Taylor SA, Larson EL. Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nat Ecol Evol. 2019;3:170–177. doi: 10.1038/s41559-018-0777-y [DOI] [PubMed] [Google Scholar]
  • 4.Powell DL, García-Olazábal M, Keegan M, Reilly P, Du K, Díaz-Loyo AP, et al. Natural hybridization reveals incompatible alleles that cause melanoma in swordtail fish. Science. 2020;368:731–736. doi: 10.1126/science.aba5216 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Buerkle CA, Rieseberg LH. The rate of genome stabilization in homoploid hybrid species. Evolution. 2008;62:266–275. doi: 10.1111/j.1558-5646.2007.00267.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Brandvain Y, Kenney AM, Flagel L, Coop G, Sweigart AL. Speciation and introgression between Mimulus nasutus and Mimulus guttatus. PLoS Genet. 2014;10:e1004410. doi: 10.1371/journal.pgen.1004410 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Harris K, Nielsen R. The Genetic Cost of Neanderthal Introgression. Genetics. 2016;203:881–891. doi: 10.1534/genetics.116.186890 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Schumer M, Xu C, Powell DL, Durvasula A, Skov L, Holland C, et al. Natural selection interacts with recombination to shape the evolution of hybrid genomes. Science. 2018;360:656–660. doi: 10.1126/science.aar3684 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Langdon QK, Powell DL, Kim B, Banerjee SM, Payne C, Dodge TO, et al. Predictability and parallelism in the contemporary evolution of hybrid genomes. PLoS Genet. 2022;18:e1009914. doi: 10.1371/journal.pgen.1009914 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Runemark A, Trier CN, Eroukhmanoff F, Hermansen JS, Matschiner M, Ravinet M, et al. Variation and constraints in hybrid genome formation. Nat Ecol Evol. 2018;2:549–556. doi: 10.1038/s41559-017-0437-7 [DOI] [PubMed] [Google Scholar]
  • 11.Edelman NB, Frandsen PB, Miyagi M, Clavijo B, Davey J, Dikow RB, et al. Genomic architecture and introgression shape a butterfly radiation. Science. 2019;366:594–599. doi: 10.1126/science.aaw2090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chaturvedi S, Lucas LK, Buerkle CA, Fordyce JA, Forister ML, Nice CC, et al. Recent hybrids recapitulate ancient hybrid outcomes. Nat Commun. 2020;11:2179. doi: 10.1038/s41467-020-15641-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Calfee E, Gates D, Lorant A, Perkins MT, Coop G, Ross-Ibarra J. Selective sorting of ancestral introgression in maize and teosinte along an elevational cline. PLoS Genet. 2021;17:e1009810. doi: 10.1371/journal.pgen.1009810 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kim BY, Huber CD, Lohmueller KE. Deleterious variation shapes the genomic landscape of introgression. PLoS Genet. 2018;14:e1007741. doi: 10.1371/journal.pgen.1007741 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Veller C, Edelman NB, Muralidhar P, Nowak MA. Recombination and selection against introgressed DNA. bioRxiv. 2021:846147. doi: 10.1101/846147 [DOI] [PubMed] [Google Scholar]
  • 16.Duranton M, Pool JE. Interactions Between Natural Selection and Recombination Shape the Genomic Landscape of Introgression. Mol Biol Evol. 2022:39. doi: 10.1093/molbev/msac122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rieseberg LH, Raymond O, Rosenthal DM, Lai Z, Livingstone K, Nakazato T, et al. Major ecological transitions in wild sunflowers facilitated by hybridization. Science. 2003;301:1211–1216. doi: 10.1126/science.1086949 [DOI] [PubMed] [Google Scholar]
  • 18.Barton NH. MULTILOCUS CLINES. Evolution. 1983;37:454–471. doi: 10.1111/j.1558-5646.1983.tb05563.x [DOI] [PubMed] [Google Scholar]
  • 19.Matute DR, Comeault AA, Earley E, Serrato-Capuchina A, Peede D, Monroy-Eklund A, et al. Rapid and Predictable Evolution of Admixed Populations Between Two Drosophila Species Pairs. Genetics. 2020;214:211–230. doi: 10.1534/genetics.119.302685 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Pamilo P. Polyandry and allele frequency differences between the sexes in the ant Formica aquilonia. Heredity. 1993;70:472–480. [Google Scholar]
  • 21.Maeder A, Cherix D, Bernasconi C, Freitag A, Ellis S. Wood ant reproductive biology and social systems. Wood Ant Ecology and Conservation. Cambridge University Press; 2016. p. 37–50. [Google Scholar]
  • 22.Beresford J, Elias M, Pluckrose L, Sundström L, Butlin RK, Pamilo P, et al. Widespread hybridization within mound-building wood ants in Southern Finland results in cytonuclear mismatches and potential for sex-specific hybrid breakdown. Mol Ecol. 2017;26:4013–4026. doi: 10.1111/mec.14183 [DOI] [PubMed] [Google Scholar]
  • 23.Kulmuni J, Seifert B, Pamilo P. Segregation distortion causes large-scale differences between male and female genomes in hybrid ants. Proc Natl Acad Sci U S A. 2010;107:7371–7376. doi: 10.1073/pnas.0912409107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kulmuni J, Nouhaud P, Pluckrose L, Satokangas I, Dhaygude K, Butlin RK. Instability of natural selection at candidate barrier loci underlying speciation in wood ants. Mol Ecol. 2020;29:3988–3999. doi: 10.1111/mec.15606 [DOI] [PubMed] [Google Scholar]
  • 25.Martin-Roy R, Nygård E, Nouhaud P, Kulmuni J. Differences in Thermal Tolerance between Parental Species Could Fuel Thermal Adaptation in Hybrid Wood Ants. Am Nat. 2021;198:278–294. doi: 10.1086/715012 [DOI] [PubMed] [Google Scholar]
  • 26.Excoffier L, Marchi N, Marques DA, Matthey-Doret R, Gouy A, Sousa VC. fastsimcoal2: demographic inference under complex evolutionary scenarios. Bioinformatics. 2021. doi: 10.1093/bioinformatics/btab468 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Dias-Alves T, Mairal J, Blum MGB. Loter: A Software Package to Infer Local Ancestry for a Wide Range of Species. Mol Biol Evol. 2018;35:2318–2326. doi: 10.1093/molbev/msy126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Martin SH, Van Belleghem SM. Exploring Evolutionary Relationships Across the Genome Using Topology Weighting. Genetics. 2017;206:429–438. doi: 10.1534/genetics.116.194720 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kelleher J, Etheridge AM, McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comput Biol. 2016;12:e1004842. doi: 10.1371/journal.pcbi.1004842 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bierne N, Lenormand T, Bonhomme F, David P. Deleterious mutations in a hybrid zone: can mutational load decrease the barrier to gene flow? Genet Res. 2002;80:197–204. doi: 10.1017/s001667230200592x [DOI] [PubMed] [Google Scholar]
  • 31.Juric I, Aeschbacher S, Coop G. The Strength of Selection against Neanderthal Introgression. Reich D, editor. PLoS Genet. 2016;12:e1006340. doi: 10.1371/journal.pgen.1006340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Martin SH, Davey JW, Salazar C, Jiggins CD. Recombination rate variation shapes barriers to introgression across butterfly genomes. Moyle L, editor. PLoS Biol. 2019;17: e2006288. doi: 10.1371/journal.pbio.2006288 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Moran BM, Payne C, Langdon Q, Powell DL, Brandvain Y, Schumer M. The genomic consequences of hybridization. Elife. 2021;10:e69016. doi: 10.7554/eLife.69016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Alachiotis N, Pavlidis P. RAiSD detects positive selection based on multiple signatures of a selective sweep and SNP vectors. Commun Biol. 2018;1:79. doi: 10.1038/s42003-018-0085-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sachdeva H, Barton NH. Replicability of Introgression Under Linked. Polygenic Selection. Genetics. 2018;210:1411–1427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kulmuni J, Pamilo P. Introgression in hybrid ants is favored in females but selected against in males. Proc Natl Acad Sci U S A. 2014;111:12805–12810. doi: 10.1073/pnas.1323045111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Blanckaert A, Payseur BA. Finding Hybrid Incompatibilities Using Genome Sequences from Hybrid Populations. Mol Biol Evol. 2021;38:4616–4627. doi: 10.1093/molbev/msab168 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Li J, Schumer M, Bank C. Imbalanced segregation of recombinant haplotypes in hybrid populations reveals inter- and intrachromosomal Dobzhansky-Muller incompatibilities. PLoS Genet. 2022;18:e1010120. doi: 10.1371/journal.pgen.1010120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sundström L, Seppä P, Pamilo P. Genetic population structure and dispersal patterns in Formica ants—a review. Ann Zool Fennici. 2005;42:163–177. [Google Scholar]
  • 40.Portinha B, Avril A, Bernasconi C, Helanterä H, Monaghan J, Seifert B, et al. Whole-genome analysis of multiple wood ant population pairs supports similar speciation histories, but different degrees of gene flow, across their European range. Mol Ecol. 2022;31:3416–3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv [q-bio.GN]. 2012. Available from: http://arxiv.org/abs/1207.3907. [Google Scholar]
  • 43.Tan A, Abecasis GR, Kang HM. Unified representation of genetic variants. Bioinformatics. 2015;31:2202–2204. doi: 10.1093/bioinformatics/btv112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021:10. doi: 10.1093/gigascience/giab008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Pfeifer SP, Laurent S, Sousa VC, Linnen CR, Foll M, Excoffier L, et al. The Evolutionary History of Nebraska Deer Mice: Local Adaptation in the Face of Strong Gene Flow. Mol Biol Evol. 2018;35:792–806. doi: 10.1093/molbev/msy004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Frichot E, François O. LEA: An R package for landscape and ecological association studies. Methods Ecol Evol. 2015. doi: 10.1111/2041-210X.12382 [DOI] [Google Scholar]
  • 49.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2019. Available from: https://www.R-project.org/.
  • 50.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Leigh JW, Bryant D. Popart: Full-feature software for haplotype network construction. Methods Ecol Evol. 2015;6:1110–1116. [Google Scholar]
  • 52.Brelsford A, Purcell J, Avril A, Tran Van P, Zhang J, Brütsch T, et al. An Ancient and Eroded Social Supergene Is Widespread across Formica Ants. Curr Biol. 2020;30:304–311.e4. doi: 10.1016/j.cub.2019.11.032 [DOI] [PubMed] [Google Scholar]
  • 53.Liu H, Jia Y, Sun X, Tian D, Hurst LD, Yang S. Direct Determination of the Mutation Rate in the Bumblebee Reveals Evidence for Weak Recombination-Associated Mutation and an Approximate Rate Constancy in Insects. Mol Biol Evol. 2017;34:119–130. doi: 10.1093/molbev/msw226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Pedersen BS, Quinlan AR. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 2018;34:867–868. doi: 10.1093/bioinformatics/btx699 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. Robust demographic inference from genomic and SNP data. PLoS Genet. 2013;9:e1003905. doi: 10.1371/journal.pgen.1003905 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Barroso G, Puzović N, Dutheil JY. Inference of recombination maps from a single pair of genomes and its application to ancient samples. PLoS Genet. 2019;15:e1008449. doi: 10.1371/journal.pgen.1008449 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Martin M, Patterson M, Garg S, Fischer SO, Pisanti N, Klau GW, et al. WhatsHap: fast and accurate read-based phasing. bioRxiv. 2016:085050. doi: 10.1101/085050 [DOI] [Google Scholar]
  • 58.Delaneau O, Zagury J-F, Robinson MR, Marchini JL, Dermitzakis ET. Accurate, scalable and integrative haplotype estimation. Nat Commun. 2019;10:5436. doi: 10.1038/s41467-019-13225-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Borowiec ML, Cover SP, Rabeling C. The evolution of social parasitism in Formica ants revealed by a global phylogeny. Proc Natl Acad Sci U S A. 2021:118. doi: 10.1073/pnas.2026029118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Dhaygude K, Nair A, Johansson H, Wurm Y, Sundström L. The first draft genomes of the ant Formica exsecta, and its Wolbachia endosymbiont reveal extensive gene transfer from endosymbiont to host. BMC Genom. 2019;20:301. doi: 10.1186/s12864-019-5665-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Yassin A, Debat V, Bastide H, Gidaszewski N, David JR, Pool JE. Recurrent specialization on a toxic fruit in an island Drosophila population. Proc Natl Acad Sci U S A. 2016;4771–4776. doi: 10.1073/pnas.1522559113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Nouhaud P, Beresford J, Kulmuni J. Assembly of a Hybrid Formica aquilonia × F. polyctena Ant Genome From a Haploid Male. J Hered. 2022;113:353–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Alexa A, Rahnenfuhrer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006:1600–1607. doi: 10.1093/bioinformatics/btl140 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Roland G Roberts

17 Mar 2022

Dear Dr Nouhaud,

Thank you for submitting your manuscript entitled "Rapid and repeatable genome evolution across three hybrid ant populations" for consideration as a Short Reports by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Mar 21 2022 11:59PM.

If your manuscript has been previously reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like to send previous reviewer reports to us, please email me at rroberts@plos.org to let me know, including the name of the previous journal and the manuscript ID the study was given, as well as attaching a point-by-point response to reviewers that details how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect some delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Roli Roberts

Roland Roberts

Senior Editor

PLOS Biology

rroberts@plos.org

Decision Letter 1

Roland G Roberts

20 May 2022

Dear Dr Nouhaud,

Thank you for your patience while your manuscript "Rapid and repeatable genome evolution across three hybrid ant populations" was peer-reviewed at PLOS Biology. It has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by four independent reviewers.

You'll see that while the reviewers are broadly positive about the study, they each raise a number of concerns; strikingly, for example, several of the reviewers think that your evidence for the hybrid populations being truly independent is currently inadequate, and this key issue must be addressed before further consideration.

In light of the reviews, which you will find at the end of this email, we would like to invite you to revise the work to thoroughly address the reviewers' reports.

Given the extent of revision needed, we cannot make a decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is likely to be sent for further evaluation by all or a subset of the reviewers.

We expect to receive your revised manuscript within 3 months. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may withdraw it.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roli

Roland Roberts

Senior Editor

PLOS Biology

rroberts@plos.org

------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

I enjoyed reading the manuscript, which was well written and addresses a timely question about hybridization using approaches from an evolutionary genomics perspective. The main results are that (probably) independent hybrid populations (i) show similar evolutionary sorting of ancestry, and (ii) that the tendancy for regions to be sorted is predictable based on features such as gene density, previous history of selection, and local recombination rate.

The system is compelling, the data are high quality and this quality is matched by a quality analysis. This work will clearly be the foundation for a high-quality and long-term research programme addressing the most outstanding questions in the field.

A side effect of timeliness, however, is a rapidly expanding literature among which this work will stand. Importantly, I found that this paper did not cite several recent and highly relevant articles (I am not an author on any papers referenced). In particular, recent work in swordtail fish, most importantly including Langdon et al. 2022 PLos Genetics, should probably be referenced and discussed. I would also like to see the patterns put more into the context of emerging theory, for example by Carl Veller (bioRxiv 2019 and ensuing articles).

I have minor concerns about the analysis that can probably be solved with a bit more transparency. In particular I am slightly concerned that hybrid incompatibilities could be filtered out by the Hardy-Weinberg based heterozygosity filter. I'd also like to see a bit more empirical rigour about why the authors suppose the mitochondria are such good evidence for independence.

Finally, the authors do not really discuss or address hybrid incompatibilities outside of the first paragraph of the main text. As BDMIs are hypothesized to be a major driver of hybrid genome evolution, this seems like a rather large omission.

Line-by-line comments:

Line 18: This is good framing for the study, but I believe this is becoming increasingly clear.

Line 38: I would argue that an additional reason to study is that it is a way to 'kick the tires' of the genome, to see how it works. We can discover new gene functions and interactions sometimes only in hybrids.

Line 65: This seems like a bit of a bait-and-switch when the models are equally supported below.

Line 66: It would seem that a reference to Molly Schumer's work (published in PNAS) where assortative mating maintains two separate sympatric hybrid populations would be appropriate here.

Line 66-9: It might seem trivial, but I think referencing a paper or supplementary result showing this explicitly would be valuable, especially in light of results below that suggest this is equivocal. This is important because the extent to which the paper's findings are of interest is largely determined by the degree of independence.

Line 71: Seems to contrast with statement below stating that F. polcytena ancestry is "more prevalent genome-wide".

Line 75: Unclear from the text whether the models also included mitochondrial haplotype.

Line 83: This paragraph is well-written and the analyses are strong.

Line 100: Could migration of haploid males (who would not transmit mitochondria) perhaps be responsible? (Am not an ant expert so this might be an ignorant statement).

Line 159: Suggest specifying whether sampled populations were taken from single-species colonies in areas of allopatry or in areas of sympatry (i.e., locations are given but this does not necessarily imply that they are allopatric).

Line 193: Recent theory and empirical studies suggest that hybrid incompatibilities cause selection for positive heterozygosity (Simon et al. 2018, Evol Letters; Thompson et al. 2022, PLoS Biology). Moreover, selection against recessive incompatibilities could also leave this signature. Could the authors comment on whether this step might filter out real signal? Do these sites show biased ancestry? What is the typical heterozygosity of excluded sites? Also, could the authors provide information and a graph to see if this was affecting regions or single SNPs? I'd like to know how much data loss this specific step resulted in.

Line 226: Could these SNPs be used in the 'selection' analyses?

Line 279: Are migrants haploid males, diploid females, or both?

Figure 1: Panel C, K=2 is not what one expects from reading the text. The hybrid populations do appear to show minor parent ancestry and different directions for different populations.

Figure 2: Panel A, not immediately clear why Scaffold 13 is used. Perhaps the figure could note why it was selected. Panels C-D, I must say that I found the 'topology weighting' approach from the figure was not what I expected from the text. The approaches taken by Langdon et al. 2022 PLoS Genetics are, to my eye, more intuitive and easier to evaluate, though I must say that after spending some time with the analysis I like the weighting approach and it is sound. I think the inset Figures added to my initial confusion: for the toplogies it appears as though the 'hybrid' is a separate entity from the two parent species, when the approach is really asking which of the two parents the sequences from hybrid populations resemble. Perhaps some annotation could help here, e.g., 'hybrid population sequence similar to F. polyctena'.

Figure 3: The presentation of data in panel A is wonderful. I haven't seen it before and if the authors came up with it, I think it is quite an innovation. For Panel B it would seem to be the axes should be flipped—isn't the statement that coding SNPs predicts ancestry, not that ancestry predicts coding SNPs?

Figure 4: This is also a very cool result and I think quite novel. In the caption instead of 'plain' I might suggest 'solid'.

Reviewer #2:

This paper provides evidence that sorting of ancestry in hybrid ant populations is repeatable and predictable based on inferred patterns of selection and recombination rate variation. The degree of inferred repeatability (correlations >0.5 for ancestry between pairs of populations) is really remarkable. If the results hold (see my comments below) this is a really exciting study.

My main concern is that interpretation of the results depends heavily on whether (or rather to what extent) the four hybrid ant populations evolved independently (i.e., independent origins and subsequent connection versus not by gene flow). The authors are clearly aware of this as well, and several analyses are aimed at addressing the degree of independence. I appreciate this, but I am not yet fully convinced that the paper has sufficiently ruled out ancestry correlations being mostly driven by shared histories.

First, as the authors note, models of shared versus independent origins had similar likelihoods. This concern is somewhat tempered by the evidence that a modest number of generations (~20% of the time since admixture or 10 out of 50 generations) passed between admixture and population splitting in the single origin models. Still, it is not clear what proportion of the observed ancestry correlations can be explained by a single origin model with ~10 generations of shared history (I am not just interested in whether the correlations are higher than expected under such a null model, but how much higher). This is an important omission.

Second, it is not clear that the best models from fastsimcoal2 are good models for the data. Specifically, the paper notes several ways in which these demographic models fail to predict patterns of ancestry (e.g., degree of sorting of ancestry). If the models do not explain the data particularly well (even if they are the best of the models considered) than it is harder to interpret deviations from predictions from the models as evidence of selection. I think two things should be done to address this issue. First, it would be nice to see that data simulated under the best models recreate patterns of genetic variation within and among the populations in general (even if not patterns of ancestry). Second, and especially if the models do not appear to predict genetic variation well, I think demographic models (especially models for single versus multiple origins) should be fit for the ancestry data directly (perhaps from the simple, naive chromosome painting, but any would be fine). It would be very interesting to see whether such models would suggest a single origin with more shared history.

My other more minor comments (with approximate line numbers) follow:

L35-45. I appreciate the brevity of the introductory paragraph but think that it consequently has fallen a bit short on putting the work in context. Most notably, this is not the first study to look at patterns genome sorting (genome stabilization) in hybrid lineages. Perhaps the best example concerns hybrid sunflowers (see, e.g., https://doi.org/10.1111/j.1558-5646.2007.00267.x; DOI: 10.1126/science.1086949), but other examples that consider repeated instances of hybrid lineage formation (Italian sparrows, Lycaeides butterflies) or that explicitly consider selection and recombination (swordtail fish) exist. This doesn't all need to be cited, but some reference somewhere to the relevant literature would be good. Likewise, the references in this section are all from the past few years but thinking on this topic goes back decades. Lastly, the term genome stabilization has been used in many papers to refer to what is here called genome sorting. I don't have a strong preference for one term over the other, but the connection should be made parenthetically.

L55-58. I think it is a bit strong to refer to this as "an ideal test case" when the evidence is only that the populations "may have independent origins" (an ideal case would be if there was very strong evidence or near certainty of independent origins).

L89. I recommend reporting the range of correlations here. They are high and worth calling out in the main text (just stating that the correlation is significantly different than 0 is much less interesting).

L93. Consider simulating not just from the best model, but from several of the best competing models.

L112-125. These patterns of enrichment are quite cool. They definitely increase my confidence that selection has contributed to the patterns of observed ancestry, though they still don't rule out a major role for shared history for the similarities among populations.

L179. State what species the reference genome is for here.

L184. Please define "normalized" in this context.

Reviewer #3:

This manuscript assesses the repeatability of genetic outcomes of hybridization in four Finnish populations of ants (in three localities), each with mixed ancestry from the widespread species Formica aquilonia and Formica polyctena. The authors find that each of the hybrid populations has fixed F. aquilonia ancestry in some parts of the genome and F. polyctena ancestry in others, and that the same genomic regions tend to be fixed for the same parental species ancestry across multiple hybrid populations . Additionally, genomic regions with low recombination rate and high gene density tend to fix alleles from the parent species with the larger inferred population size. Taken together, this suggests that natural selection shapes the similar outcomes of hybridization in multiple hybrid populations. The work addresses a consequential question in the field of speciation genetics , and the writing and figures are clear and easy to follow.

There are two areas where I am not fully convinced: First, the independence of the three localities with hybrid populations seems a bit overstated as currently written. As the authors note, the likelihoods of models assuming a single origin of hybrids vs. independent origin of each hybrid population are similar. The main text doesn't point out that the likelihoods are higher for single origin than independent origin in all four pairwise comparisons of hybrid populations, suggesting that the histories of these Finnish hybrid populations are not completely independent of each other. The argument in the text that mitochondrial haplotype networks support three independent origins of hybridization doesn't appear very strong; there is substantial mitochondrial polymorphism in both parent species, and some of this variation could have been present in a single ancestral hybrid population before fixing different alleles in different descendant hybrid populations.

A more substantial logical inconsistency is that the study relies heavily on analyses that assume neutrality in order to argue for pervasive effects of selection across the genome. To what extent and in what direction would the inferences of demographic history and recombination be biased by the selection required to produce the observed levels of sorting of parental variation in the hybrid populations, and the number and extent of inferred selective sweeps in the parent species? The demographic results, in particular, would be more convincing if similar results are obtained when restricting the analysis to genomic regions that are not sorted in hybrids, and/or have low [mu] scores in the scan for signatures of selection. In fact, convergent selection in hybrid populations with truly independent origins might even produce a false signal of shared ancestry or gene flow.

Minor details:

L386: change "was ran" to "was run"

Tables S3 and S4 have Excel formula errors, such that Δlikelihood and LogLikelihood are identical for four of the models in each table.

Reviewer #4:

This paper presents fascinating findings. Namely, that three separate hybridization events have occurred between two closely related species of ants, and that each hybrid population tends to share similar ancestry components along the genome. They suggest this pattern of correlated ancestry component sorting cannot be explained by chance and that shared selection pressure for specific ancestry components is the best explanation. I found the paper to be well-written and their approach and methods to be sensible and clearly justified. In particular, the authors take considerable efforts to model and account for demographic history and recombination rate variation, both of which effect signatures of selection and hybridization.

Despite very much liking the paper overall, I have some criticisms related to how the authors presented and used simulations to support their claims.

In the results section on line 90, the authors state: "To test whether such predictability would be expected under neutrality, we used MSPRIME (15) to simulate neutral admixture scenarios following the best history and demographic parameters inferred for each population pair"

Related to this in the methods section on line 380 they state: "To do so, we used the parameters inferred by the best model identified for each population pair with FASTSIMCOAL2 and ran simulations in MSPRIME (v1.0.2, 15), modeling 100 non-recombining 10 kbp blocks with a recombination rate of 10-6 within blocks."

This is more or less all the details readers are given in the text about the msprime simulations, and it is difficult for me to tell what exactly was done. If I understand correctly, the authors fit models with FASTSIMCOAL2 that included all their populations. Did the msprime simulations exactly match the models inferred in FASTSIMCOAL2? Were all populations and their demographic changes and sizes included in these simulations, or were simulations done separately for different pairs? It seems quite important to me that the authors did the former in order to make apples to apples comparisons to their empirical results, but I cannot quite tell from these sentences if that is what they did, or if simpler simulations with two populations at a time were considered. Please better clarify what was done and justify these decisions in the text.

Second and more importantly, given the similar fit of the shared and independent origins models (as explained on line 75), the authors need to assess the correlated sorting of ancestry from simulations for both model types. Does a model assuming shared origins have correlations close to the observed data, or does selection still need to be invoked to explain the observed patterns, even if the shared origins model is actually the correct one? Related to my comments above, using msprime simulations that include all populations simultaneously (rather than pairs) seems important to adequately represent a shared hybrid origins model. Demonstrating that the correlation in ancestry along the genome is due to selection sorting the ancestry components, even if the inferred shared hybridization event model is actually correct, would greatly aid the main (and most exiting) claims of the paper.

Decision Letter 2

Roland G Roberts

27 Oct 2022

Dear Dr Nouhaud,

Thank you for your patience while we considered your revised manuscript "Rapid and predictable genome evolution across three hybrid ant populations" for publication as a Short Reports at PLOS Biology. This revised version of your manuscript has been evaluated by the PLOS Biology editors, the Academic Editor and three of the original reviewers.

Based on the reviews, we are likely to accept this manuscript for publication, provided you satisfactorily address the following data and other policy-related requests.

a) Please address my Data Policy requests below; specifically, we need you to supply the numerical values underlying Figs 1BCDEF, 2ABCDE, 3AB, 4AB, S1, S2, S3, S4, S5, either as a supplementary data file or as a permanent DOI’d deposition, e.g. part of your Figshare depo (if some of your Figures can be generated from the data already in Figshare, please clarify).

b) Please cite the location of the data clearly in all relevant main and supplementary Figure legends, e.g. “The data underlying this Figure can be found in S1 Data” or “The data underlying this Figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v1”

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Roli Roberts

Roland Roberts, PhD

Senior Editor,

rroberts@plos.org,

PLOS Biology

------------------------------------------------------------------------

DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: Figs 1BCDEF, 2ABCDE, 3AB, 4AB, S1, S2, S3, S4, S5, NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

IMPORTANT: Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

------------------------------------------------------------------------

SPECIES INDICATED IN THE ABSTRACT?

- Please note that per journal policy, the model system/species studied should be clearly stated in the abstract of your manuscript.

------------------------------------------------------------------------

BLOT AND GEL REPORTING REQUIREMENTS:

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare and upload them now. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

------------------------------------------------------------------------

DATA NOT SHOWN?

- Please note that per journal policy, we do not allow the mention of "data not shown", "personal communication", "manuscript in preparation" or other references to data that is not publicly available or contained within this manuscript. Please either remove mention of these data or provide figures presenting the results and the data underlying the figure(s).

------------------------------------------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

I think the authors have done an exemplary job of engaging thoughtfully with reviewers. I am happy to sign off.

Reviewer #3:

The authors have done excellent work in response to my comments and, in my opinion, those of the other reviewers as well.

Reviewer #4:

I commend the authors for the considerable effort they put into addressing reviewer concerns. I think the manuscript is improved from their efforts. I do not have any further issues or recommendations to raise.

Decision Letter 3

Roland G Roberts

14 Nov 2022

Dear Dr Nouhaud,

Thank you for the submission of your revised Short Reports "Rapid and predictable genome evolution across three hybrid ant populations" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, Leonie Moyle, I'm pleased to say that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely, 

Roli Roberts

Roland G Roberts, PhD, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Visualization of the first 5 principal components of the principal component analysis performed over 46,886 SNPs genome-wide (5 kb-thinned, MAC ≥ 2).

    The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

    (TIF)

    S2 Fig. Comparison of ancestry mapping approaches.

    For each hybrid population (columns) are shown TWISST ΔWEIGHT. statistics vs. LOTER local ancestry estimates (first row, 14,890 100-SNP windows), TWISST ΔWEIGHT. statistics vs. naive chromosome painting local ancestry estimates (PAINTING, second row, 5,529 windows with at least 5 ancestry-informative SNPs), and LOTER vs. naive chromosome painting local ancestry estimates (third row, 5,529 windows with at least 5 ancestry-informative SNPs). ΔWEIGHT. ranges between −1 if all topologies in the window group the hybrid population with F. aquilonia, to +1 if with F. polyctena. LOTER and naive chromosome painting are both SNP-based (results averaged over windows) and code ancestries as 0 for F. aquilonia and 1 for F. polyctena. In each panel, the regression line is indicated in white. ρ, Spearman’s correlation coefficient and P, P-value of the Spearman’s correlation test. The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

    (TIF)

    S3 Fig. Observed (OBS.) and simulated levels of local ancestry correlation (left, each point is a chromosome) and sorting (right) for each hybrid population pair (rows).

    The degree of sorting is measured as the absolute F. aquilonia or F. polyctena weighting. IO: independent origins scenario, SO: single origin scenario (100 independent runs per scenario). The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

    (TIF)

    S4 Fig. Principal component analyses of observed and simulated datasets for each hybrid population pair (rows).

    Observed PCAs were obtained as per Fig 1 (5 kb-thinned SNP data, minor allele count ≥2). Simulations were run with msprime using parameter estimates inferred under both single and independent origins scenarios with fastsimcoal2 and assuming a mutation rate of 3.5 × 10−9. One run was randomly picked per simulated scenario. The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

    (TIF)

    S5 Fig

    Distribution of LOTER local ancestry estimates (x-axis, 0: fixed for F. aquilonia ancestry component, 1: fixed for F. polyctena ancestry component) across recombination rate (upper row) and gene density (lower row) quartiles in each hybrid population (columns), computed over 20 kbp non-overlapping windows. Medians are indicated with black dots. The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.c.6140793.v3.

    (TIF)

    S1 Table. Sample information, sequencing statistics, hybrid indices, and accession numbers for each sample analyzed in the study.

    In the caste column, w: worker and q: young unmated queen. Principal component coordinates (PC1 and PC2) and F. aquilonia admixture proportions (sNMF clustering analysis run with K = 2, sNMF_K2_Faq) are both based on a 5 kbp-thinned dataset of 46,896 SNPs with MAC ≥ 2. HI_loter values are based on 1,490,364 phased SNPs, with non-Finnish species samples used as reference panels. HI_paint values are based on 79,336 ancestry-informative markers displaying allele frequency differences ≥ 80% between non-Finnish species samples.

    (XLSX)

    S2 Table. Demographic parameters estimated by fastsimcoal2 in demographic model analyses.

    Unless bounded, the upper limit of the search range could be exceeded. Each model used only a subset of these parameters. The time of admixture parameter (TADMS) indicated with an asterisk (*) was first unconstrained, and then constrained to test for recent hybridization (<50 generations). Double asterisks (**) mark parameters that calculation changes between models. The alternative minimum and maximum bounds are displayed in the respective columns.

    (XLSX)

    S3 Table. Maximum likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in Pikkala (contained 348,228 sites).

    All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,377,703.973. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

    (XLSX)

    S4 Table. Maximum-likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in Bunkkeri (contained 463,401 sites).

    All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,821,814.189. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

    (XLSX)

    S5 Table. Maximum likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in LångholmenW (contained 289,948 sites).

    All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,144,658.203. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

    (XLSX)

    S6 Table. Maximum-likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in LångholmenR (contained 223,927 sites).

    All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −883,471.568. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

    (XLSX)

    S7 Table. Demographic parameters estimated by fastsimcoal2 in demographic model analyses.

    Unless bounded, the upper limit of the search range could be exceeded. Each model used only a subset of these parameters.

    (XLSX)

    S8 Table. Maximum likelihood parameter estimates for the models concerning the history of the F. aquilonia × F. polyctena hybrid populations sampled in Bunkkeri and Pikkala (contained 328,913 sites).

    All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,509,459.108. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

    (XLSX)

    S9 Table. Maximum likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in Bunkkeri and LångholmenW (contained 282,215 sites).

    All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,291,308.181. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

    (XLSX)

    S10 Table. Maximum likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in Pikkala and LångholmenW (contained 218,545 sites).

    All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −1,001,623.462. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

    (XLSX)

    S11 Table. Maximum-likelihood parameter estimates for all models concerning the history of the F. aquilonia × F. polyctena hybrid population sampled in LångholmenW and LångholmenR (contained 150,207 sites).

    All effective sizes (Ne) are given in number of haploids. Times are given in number of generations. Migration rates are scaled according to population effective sizes (2Nm). Maximum-likelihood estimates for parameters are taken from the run reaching the highest composite likelihood of the 100 runs performed. Likelihoods are given in logarithmic scale. Maximum observed likelihood for this dataset is −693,262.061. ΔLikelihood is calculated by subtracting the expected likelihood from the maximum observed likelihood.

    (XLSX)

    S12 Table. Observed and simulated average levels of diversity (π) for each population in each comparison.

    Simulated estimates were generated with msprime using both models inferred with fastsimcoal2 (SO: single origin scenario, IO: independent origin scenario). Simulated estimates were averaged over 100 independent runs.

    (XLSX)

    S13 Table. Observed and simulated average levels of divergence (FST) between populations within each comparison.

    Simulated estimates were generated with msprime using both models inferred with fastsimcoal2 (SO: single origin scenario, IO: independent origin scenario). Simulated estimates were averaged over 100 independent runs.

    (XLSX)

    Attachment

    Submitted filename: Nouhaud_2022_response.pdf

    Data Availability Statement

    All FASTQ files are available on ENA under project PRJEB55288 (hybrid samples) and PRJEB51899 (F. aquilonia & F. polyctena samples). VCF files, FASTSIMCOAL2 files and scripts, and statistics computed over genomic windows are available from figshare: https://doi.org/10.6084/m9.figshare.c.6140793.v3. Bioinformatic and MSPRIME scripts are available from https://github.com/pi3rrr3/antmixture.


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES