Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2024 Dec 23;20(12):e1011521. doi: 10.1371/journal.pgen.1011521

Polyploids broadly generate novel haplotypes from trans-specific variation in Arabidopsis arenosa and Arabidopsis lyrata

Magdalena Bohutínská 1,2,3,*, Eliška Petříková 1, Tom R Booker 4, Cristina Vives Cobo 1, Jakub Vlček 1,5, Gabriela Šrámková 1, Alžběta Poupětová 1, Jakub Hojka 1,2, Karol Marhold 1, Levi Yant 1,6,#, Filip Kolář 1,2,#, Roswitha Schmickl 1,2,*,#
Editor: Yalong Guo7
PMCID: PMC11706510  PMID: 39715277

Abstract

Polyploidy, the result of whole genome duplication (WGD), is widespread across the tree of life and is often associated with speciation and adaptability. It is thought that adaptation in autopolyploids (within-species polyploids) may be facilitated by increased access to genetic variation. This variation may be sourced from gene flow with sister diploids and new access to other tetraploid lineages, as well as from increased mutational targets provided by doubled DNA content. Here, we deconstruct in detail the origins of haplotypes displaying the strongest selection signals in established, successful autopolyploids, Arabidopsis lyrata and Arabidopsis arenosa. We see strong signatures of selection in 17 genes implied in meiosis, cell cycle, and transcription across all four autotetraploid lineages present in our expanded sampling of 983 sequenced genomes. Most prominent in our results is the finding that the tetraploid-characteristic haplotypes with the most robust signals of selection were completely absent in all diploid sisters. In contrast, the fine-scaled variant ‘mosaics’ in the tetraploids originated from highly diverse evolutionary sources. These include widespread novel reassortments of trans-specific polymorphism from diploids, new mutations, and tetraploid-specific inter-species hybridization–a pattern that is in line with the broad-scale acquisition and reshuffling of potentially adaptive variation in tetraploids.

Author summary

Polyploidy, the result of whole genome duplication, is associated with speciation and adaptation. To fuel their often remarkable adaptations, polyploids may access and maintain adaptive alleles more readily than diploids. Here, we identify repeated signals of selection on genes that are thought to mediate adaptation to whole genome duplication in two Arabidopsis species. We found that the tetraploid-characteristic haplotypes, found in genes exhibiting the most robust signals of selection, were never present in their diploid relatives. Instead, these haplotypes were made of novel ‘mosaics’ forged from multiple allelic sources. We hypothesize that this increased variation forms the genic basis of the potentially eased adaptation of polyploids.

Introduction

Whole genome duplication (WGD) is widespread across eukaryotes, especially in plants. It comes with significant costs, such as meiotic instability and cell cycle changes, both of which require adaptation to WGD [1,2]. Over the last decade, a body of work has accrued describing means by which within-species WGD lineages (taxonomic autopolyploids) or WGD lineages with polysomic inheritance (random chromosome pairing; genetic autopolyploids [3,4]) overcome these challenges. The most obvious of these appears to be a stabilization of cell division via selection acting at meiotic and mitotic genes [58]. Additionally, cyclin genes have been observed to mediate tolerance to tetraploidization in polyploid tumors and in proliferating Arabidopsis tissues [912]. While certain genes involved in adaptation to WGD have been identified and their functions verified [1315], a broader evolutionary context remains elusive. Questions persist regarding the origins of this adaptive genetic variation and the mechanisms by which these variants assemble into positively selected haplotypes in nascent polyploids.

In diploids, various sources contribute to the pool of adaptive alleles, including gene flow/introgression, ancestral standing variation, and de novo mutations (Fig 1A). Recent work indicates the potential for all these sources to become greater following WGD (Fig 1A). First, polyploid lineages can benefit from the weakening of hybridization barriers [1618], leading to increased allelic exchange through introgression [19,20]. Polyploids can also gain additional diversity via introgression from sister diploids, whereas introgression in the opposite direction is much less likely due to the nature of the diploid-tetraploid barrier [2124]. Second, polyploids maintain genetic diversity more efficiently due to increased masking of recessive alleles [25,26], thus keeping a larger pool of standing variation [27]. Finally, polyploids experience elevated mutational input due to the doubled chromosome numbers, leading to a higher number of novel mutations per generation [28]. Consequently, autopolyploids are hypothesized to generate and maintain a greater degree of genetic variability than diploids.

Fig 1. Hypotheses about allelic sources in a diploid-autotetraploid system.

Fig 1

A: As compared to diploids, autotetraploid lineages may acquire alleles via increased introgression potential, higher level of retainment of ancestral standing variation, and increased population-scaled mutational input. Shown are two diploid species (red), each of which gave rise to an autotetraploid lineage (blue). B: Higher variability in the possible sources of potentially beneficial alleles suggests that positively selected haplotypes in tetraploids might be more likely to form from a mosaic of sources (right), in contrast to the traditionally assumed homogeneous (single-source) scenario (left).

Traditionally, beneficial alleles have been envisioned as originating from a single source (Fig 1B), either from introgression, standing variation, or de novo mutations [2933]. However, recent evidence suggests that alleles can also gradually accrue to form finely tuned haplotypes [3436]. This implies that adaptive haplotypes may accumulate from multiple sources, rather than just one (Fig 1B). Here, we ask if polyploids use their expanded allelic sources to construct novel fine-scaled ‘mosaic’ haplotypes of diverse origins, what those exact origins are, and if the genes involved in adaptation to WGD form an exception compared to the genomic background.

The naturally ploidy-variable (diploid and autotetraploid) species Arabidopsis arenosa and Arabidopsis lyrata have emerged as powerful models to understand adaptation to WGD [37]. Recent research in these species identified shared candidate alleles mediating adaptation to WGD, which were shared between A. arenosa and A. lyrata by a process of tetraploid-specific adaptive introgression [5,7,20,35]. There have been indications that tetraploidy-related candidate adaptive alleles might have originated from specific source populations, or that recombination may be involved in the construction of adaptive alleles [20,35]. However, these were based on small sample sizes, and the studies did not consider a role for trans-specific polymorphism (here represented by ancestral alleles shared among different Arabidopsis species).

Here, we take advantage of an exhaustive dataset of 983 sequenced individuals encompassing all known lineages of European Arabidopsis autotetraploids, backed by genome-wide diversity of all diploid outcrossing Arabidopsis species. Apart from the tetraploid A. lyrata lineage from the eastern Austrian Forealps (‘Austria’ hereafter; [20,38]), we newly genomically characterized two additional A. lyrata tetraploid lineages from Central Europe: south-eastern Czechia (‘Czechia’ hereafter) and Harz in Germany (‘Germany’ hereafter). We performed a joint analysis of these three A. lyrata tetraploid lineages and tetraploid A. arenosa, which derived from a single WGD event [26,39]. We used this sampling of four tetraploid lineages, established which genes and processes are robustly under selection in all of them, and elucidated the sources of candidate adaptive alleles to WGD. Using this knowledge, we asked to which extent tetraploid-specific haplotype blocks (referred to as ‘haplotypes’ further) were formed and what their evolutionary sources were.

Results

Introgression between all four autotetraploid Arabidopsis lineages in Central Europe

We found that A. lyrata autotetraploids occur at geographically distinct locations throughout Central Europe (Fig 2A and S1 and S2 Data). To obtain comparable and representative samples of A. lyrata and A. arenosa populations for diploid-tetraploid selection scans, we first analyzed 154 individuals from 17 proximal diploid and tetraploid populations (Fig 2A). We assessed population structure and potential admixture using bootstrapped allele covariance trees (TreeMix [40]; Figs 2B and 2C), principal component analysis (S1 Fig), neighbor-joining networks (SplitsTree [41]; S1 Fig), and Bayesian clustering (fastSTRUCTURE [42]; S1 Fig). In A. arenosa, we identified a single Central European tetraploid lineage (Fig 2B) in line with previous range-wide studies [26,39]. Notably, A. lyrata tetraploids consisted of three differentiated lineages in Austria, Czechia, and Germany (Fig 2C). Both species and cytotypes maintained high genetic diversity (A. arenosa: mean nucleotide diversity over four-fold degenerate sites (4d-π) = 0.026/0.024 for diploids/tetraploids; A. lyrata: mean 4d-π = 0.012/0.016 for diploids/tetraploids; S1 Table). Although Central European A. lyrata has a patchier distribution and showed lower nucleotide diversity than A. arenosa, both species and cytotypes exhibit neutral Tajima’s D, indicating their populations are very close to neutrality (mean Tajima’s D in A. arenosa = 0.01/0.20 for diploids/tetraploids; mean Tajima’s D in A. lyrata = 0.27/0.23 for diploids/tetraploids; S1 Table).

Fig 2. Evolutionary relationships and genomic signatures of selection in tetraploid populations of A. lyrata and A. arenosa.

Fig 2

A: Locations of the focal 11 A. lyrata populations (AL Austria, AL Czechia, and AL Germany) and six A. arenosa populations (AA CEur) in Central Europe. Red and blue dashed lines show the ranges of diploid and tetraploid A. lyrata, respectively; tetraploids of A. arenosa occur throughout the entire area. For the map, the base layer ‘World Imagery’ was taken from https://www.usgs.gov/products/, which is granted to be published under CC BY 4.0 license. B, C: Phylogenetic relationships among the focal populations of A. arenosa (B) and A. lyrata (C) inferred by TreeMix analysis assuming no migration. Asterisks show bootstrap support = 100%. D: Introgression among tetraploid but not diploid populations of both Arabidopsis species. ABBA-BABA statistics demonstrating excess allele sharing between tetraploids (bottom tree), but not diploids (top tree) of A. arenosa and each of the three A. lyrata tetraploid lineages. P1 is BDO, the earliest diverging and spatially isolated diploid population of A. arenosa. E: A set of 14 unlinked genes showing significant evidence (p < 0.01) of positive selection (positively selected genes, PSGs) in the four tetraploid lineages identified using PicMin. Additional three genes, HEI10, SDS, and SYN1, were identified using a screen for candidate SNPs. F: Functional characterization of the 17 PSGs by STRING analysis. The network shows predicted protein-protein interactions among the PSGs. The width of each line corresponds to the confidence of the interaction prediction. PSGs were annotated into four processes, each represented by a bubble of different color. The 12 PSGs with names written in bold had enough candidate SNPs for the reconstruction of tetraploid haplotypes (see the main text).

Next, we examined introgression between the two species using D-statistics (ABBA-BABA statistics [43]). While insignificant among diploids, we identified evidence of introgression (D between 0.106 to 0.108; Fig 2D) between tetraploids of A. arenosa and each of the three tetraploid A. lyrata lineages.

Genomic signatures of repeated differentiation associated with WGD

To infer candidate genes associated with signatures of selection in diploid-tetraploid contrasts, we compared each of the four tetraploid A. lyrata/A. arenosa lineages to their geographically most proximal diploids (Figs 2A–2C). Using PicMin [44], which measures repeatedly selected regions in genomic windows, we identified 54 repeatedly differentiated windows (PicMin FDR-corrected q-value < 0.01; S3 Data), overlapping 14 unlinked candidate genes (Fig 2E). To capture narrower peaks of differentiation, we also conducted a search for genes featuring highly diploid/tetraploid differentiated SNPs present in multiple lineages (see Methods). Such a ‘candidate SNP’ approach can identify genes undergoing positive selection in tetraploids on a subset of SNPs while the evolution of the rest of the gene sequence is constrained. This analysis, with 1% outlier Fst cutoff, identified all 14 PicMin candidate genes and three additional candidates (HEI10, SDS, and SYN1), each with multiple outlier SNPs differentiated across all four tetraploid lineages. In total, we recognized 17 candidate positively selected genes (PSGs) showing robust, repeated selection signals in tetraploid lineages (S2 Table). Notably, these loci do not cluster in regions characterized by extreme recombination rates (S2 Fig), indicating no bias associated with the recombination landscape [45].

To address in which molecular processes the 17 WGD-associated PSGs are involved, we predicted protein-protein interactions among them using STRING [46]. We retrieved two interconnected clusters: control of cell cycle and endoreduplication, represented by the genes CYCA2;3, CYCD3;2, CYCD5;1, and chromosome pairing and segregation during meiosis, with the genes ASY1, ASY3, HEI10, PDS5b, SCC4, SDS, SYN1, ZYP1b (p < 0.001, Fisher’s exact test; Fig 2F and S3 Table). An additional six of the 17 PSGs were not connected in this interaction network. Four of these six genes were found to be related to mRNA transcription (AT2G33845, NRPB9A, TfIIFbeta, ZOP1; Fig 2F). AT4G18490 encodes an unknown protein that is strongly expressed in young A. thaliana flower buds and involved in cell division [47]. AGC1.5 was not connected to any of these processes. It has recently been shown to be involved in maintaining functional pollen tubes in tetraploid A. arenosa [15]. We provide further functional interpretations of the candidate PSGs in S1 Text. The candidate PSGs varied in their proportion of differentiated cis-regulatory, nonsynonymous, and synonymous SNPs (S3 Fig and S1 Text).

Some phenotypic shifts in polyploids have been associated with the above identified molecular processes in Arabidopsis. Specifically, cytological studies reported stable meiotic chromosome segregation in established tetraploids of A. arenosa and A. lyrata compared to neo-tetraploids of A. arenosa, which suggests a compensatory shift in the meiotic stability phenotype [5,13,14,20,35]. Here, we investigated whether endoreduplication follows a similar pattern of compensatory evolution after WGD, in line with the suggestion that “whole-plant polyploidy may have a subtle inhibitory effect on the extent to which cells undergo endopolyploidy” [1]. Using flow cytometry, we found a substantial decrease in endoreduplication within established tetraploids compared to synthetic tetraploids, while accounting for the variation in diploids (S4 and S5 Figs and S2 Text). This finding suggests compensatory changes, possibly reflecting a high cost of additional rounds of genome duplication in polyploids and/or structural constraints on maximum cell size [48,49]. It should be highlighted, however, that causality between the observed phenotype and variation in the relevant PSGs needs further functional validation.

Tetraploid haplotypes are assembled from diverse allelic sources

Next, we tested whether these positively selected regions in tetraploids aggregate as mosaics from disparate allelic sources (Fig 1B): To do this, we first constructed the haplotypes of the PSGs and identified the specific allelic sources for every candidate tetraploid SNP. Removing five PSGs with insufficient variation (less than five candidate SNPs), we analyzed 12 genes comprising in total 232 candidate SNPs (S4 Data). For each allele, we reconstructed diploid- or tetraploid-characteristic haplotype blocks using these candidate SNPs as markers following [7]. These haplotype blocks corresponded to physical haplotypes observed in PacBio HiFi reads from five diploid and five tetraploid individuals of A. arenosa (71% correspondence across 132 checked long reads, 95% correspondence if heterozygous sites were excluded; S5S7 Data), and we term them ‘haplotypes’ hereafter for simplicity.

We identified a single major tetraploid-characteristic haplotype for each of the 12 candidate genes that was shared across both A. lyrata and A. arenosa tetraploids (present in 92% and 98% of populations, respectively, mean frequency = 0.62 across 61 tetraploid populations; Fig 3B and S4 Table). We further identified two major diploid haplotypes for each gene, one A. lyrata-characteristic and one A. arenosa-characteristic (Fig 3B; see Fig 3A for the locations of populations of the different Arabidopsis species and cytotypes used in this analysis). Other haplotypes were of minor frequency (Fig 3B). Importantly, none of the tetraploid haplotypes associated with the 12 PSGs were found in any diploid individuals, even though our sampling included all of the known potential source lineages for the tetraploids.

Fig 3. Distribution of haplotypes of the 12 positively selected genes (PSGs) involved in adaptation to WGD.

Fig 3

A: Populations in this analysis (504 diploids and 479 tetraploids). Arabidopsis congeners are A. halleri, A. croatica, A. cebennensis, and A. pedemontana. The map was vectorized and modified according to the base layer ‘Dark Gray canvas’, https://www.usgs.gov/products/, which is granted to be published under CC BY 4.0 license. B: Frequency of shared tetraploid (blue), A. arenosa diploid (light red), and A. lyrata diploid (red) haplotypes in each of the 12 PSGs. All other haplotypes present (including possible recombinants of the above) are shown in grey. Note the absence of tetraploid haplotypes in diploids.

Despite the absence of each of the 12 tetraploid haplotypes (corresponding to the 12 PSGs) in diploids, certain tetraploid SNPs defining these haplotypes were detected across diploid populations, present in one or even across multiple species (Fig 4). We therefore investigated the possibility that the tetraploid haplotypes might be assembled from multiple evolutionary sources (introgression, standing variation, de novo mutations). To do this, we inferred the most likely source for each candidate SNP marking these tetraploid haplotypes. Considering the extent of trans-specific polymorphism in outcrossing Arabidopsis [50], we worked with the full dataset of A. arenosa, A. lyrata, and all other diploid Arabidopsis outcrossers (A. halleri, A. croatica, A. cebennensis, and A. pedemontana) (S1 and S2 Data). Using the phylogenetic relationships among these species ([51], Fig 4A) and the SNP presence/absence data from all 504 diploid individuals, we identified the most likely source for each of the 232 candidate SNPs (Fig 4 and S8 Data). The different allelic sources forming tetraploid haplotypes varied genome-wide (when analyzing all 12 PSGs together), but also within individual tetraploid haplotypes (3–6 scenarios per haplotype, median = 4, out of 7; Figs 4 and S6). Overall, 65.5% of the SNPs forming tetraploid haplotypes were most parsimoniously inferred as trans-specific polymorphism, segregating in diploids of multiple Arabidopsis species (Fig 4A, scenarios 1–4). Note that this number is likely an underestimate as some alleles may have remained unsampled or have gone extinct in any diploid lineage. Only 6.9% of the candidate SNPs were inferred as arising from standing variation from a single diploid progenitor (A. arenosa or A. lyrata; Fig 4A, scenarios 5, 6), and the remaining 27.6% of SNPs possibly accumulated de novo in tetraploids or remained unsampled in our dataset (absent in any of the 504 diploid individuals; Fig 4A, scenario 7). Detailing trans-specific polymorphism, we further quantified that 35.8% of the candidate SNPs marking the tetraploid haplotypes were likely contributed from diploid A. arenosa (present in this species and possibly in diploids of other species, but not A. lyrata; Fig 4A, scenarios 2, 5), while 3.4% likely came from diploid A. lyrata (Fig 4A, scenarios 3, 6). Finally, in the process of tetraploid haplotype formation through these identified source scenarios, 71.1% of the candidate SNPs were indicative of interspecific introgression across the tetraploid lineages. This was inferred from their presence in tetraploids and in at most one of the diploid progenitors (either diploid A. arenosa or A. lyrata; note that this scenario is nested within others; Fig 4A, scenarios 2–7).

Fig 4. Mosaic of allelic sources of tetraploid haplotypes corresponding to the 12 positively selected genes (PSGs) associated with adaptation to WGD.

Fig 4

A: The 232 candidate SNPs marking the 12 tetraploid haplotypes were categorized into one of the seven source scenarios based on their allele distribution in the 983 samples. The most parsimonious origin of each pattern is provided in italics. Four scenarios (outlined by orange frame) involve trans-specific standing variation shared among diploids of at least two Arabidopsis species (‘standing in diploids’). Six scenarios (outlined by blue frame) require introgression between tetraploid lineages. Boxes ‘min 1’ show that the tetraploid standing variation is present in at least one outgroup species. Phylogenetic relationships according to [51], Fig 4A. B: Variable source scenarios for each of the 12 PSGs. Barplots show the proportion of candidate SNPs representing each of the seven source scenarios, numbers above bars indicate the number of candidate SNPs per each PSG. Two example PSGs detailed in panel C are highlighted in bold. C: Illustration of diploid and tetraploid haplotypes for two PSGs. Bold capital letters represent haplotype-defining alleles, and two letters at a position indicate the presence of an alternative minor allele. Coding sequences are highlighted as black boxes as part of the gene model above the haplotypes (only regions overlapping with candidate SNPs are shown). The bottom line ‘allelic source’ displays the SNP assignment to its source scenario as defined in panel A. Possible recombination breakpoints required to construct the haplotype from the different allelic sources are marked with asterisks. Top: example from the CYCA2;3 endoreduplication gene, displaying 19 candidate SNPs spanning 5371 bp. Bottom: example from the PDS5b meiosis gene, depicting candidate SNPs spanning 8904 bp.

In summary, this analysis shows that tetraploid haplotypes represent a combination of reuse of trans-specific as well as species-specific SNPs, with additional input of de novo mutations after WGD (Figs 4B and 4C). The mosaic nature of allelic sources was further supported by the reticulation observed in the neighbor-joining networks of the 12 PSGs (S7 Fig). In these networks, tetraploids form a single lineage that connects through multiple splits to multiple diploid Arabidopsis lineages. This variability of allelic sources was also found for the genomic background (of 232 randomly sampled synonymous SNPs sampled 1000 times). Interestingly, however, the proportion of trans-specific polymorphism in the 232 candidate SNPs was higher compared to the genomic background average (65.5% compared to 50.0%, respectively, p-value = 0.001, 1000 permutations; Fig 4A and S5 Table). Similarly, the proportion of SNPs indicative of introgression across the tetraploid lineages was much higher for the candidate SNPs than for the average of the genomic SNP subset (71.1% compared to 56.4%, respectively, p-value = 0.002, 1000 permutations; Fig 4A and S5 Table). Finally, the proportion of SNPs exclusively present in tetraploids was slightly but nonsignificantly higher for the 232 candidate SNPs compared to the average of the genomic background (27.6% compared to 24.5%, respectively, p-value = 0.493, 1000 permutations; Fig 4A and S5 Table).

Discussion

To exhaustively assess sources of adaptive variation in young autotetraploids of A. arenosa and A. lyrata, we sampled all regions possibly harboring the tetraploid cytotype, focusing on Central European A. lyrata [20,35,38,52] and all European diploid outcrossing Arabidopsis species. Notably, A. lyrata tetraploids consisted of three differentiated lineages in Austria, Czechia, and Germany that were genetically close to parapatric diploid populations of the same species. This suggests either independent formation and establishment of tetraploid populations in each region, or a single origin of autotetraploid A. lyrata followed by allopatry and rampant local introgression from parapatric diploid A. lyrata lineages. In A. arenosa, we identified a single Central European tetraploid lineage, consistent with previous studies [26,39]. Our finding of introgression between tetraploids of A. arenosa and each of the three tetraploid A. lyrata lineages is in line with previous experimental studies: diploids of both species exhibit strong postzygotic barriers, whereas polyploidy-mediated hybrid seed rescue (i.e. reestablishment of endosperm cellularization through polyploidy [17]) enables hybridization between tetraploid A. arenosa and A. lyrata [20,38]. Overall, we identified four tetraploid lineages in Central European Arabidopsis, consistent with two to four independent WGD events (one in each species and possibly up to two additional WGD events in A. lyrata).

We found 17 PSGs shared across the tetraploid lineages, mediating processes of homologous chromosome pairing during prophase I of meiosis, cell cycle timing and regulation of endoreduplication via different classes of cyclins, and mRNA transcription via RNA polymerase II. A similar set of WGD-associated candidate genes was found in [5] for A. arenosa and in [20] for the Austrian lineage of A. lyrata. The interacting meiosis and cell cycle regulation PSGs likely mediate beneficial phenotypic shifts resulting in the establishment of tetraploid lineages. Previous work reported relatively stable meiotic chromosome segregation in established tetraploids of A. arenosa compared to newly synthesized tetraploids of A. arenosa [5,13,14], indicating compensatory evolution following WGD. Here, we show signals of such compensatory evolution also in the case of endoreduplication, as established A. arenosa tetraploids exhibited a decrease in endoreduplication compared to newly synthesized tetraploids. Additional mechanistic work is required to understand the functional and strategic basis of this shift. It may be a homeostatic response to maintain cell size, overall DNA content, or mitotic stability at high ploidies [48,49]. Recent studies have repeatedly demonstrated that certain genes involved in meiosis and cell cycle regulation are the core mediators of adaptation to WGD in A. arenosa and A. lyrata [5,20]. Yet, additional processes, like ion management [6,8] or pollen tube growth [15], have been found to play a role in adaptation to WGD across the family Brassicaceae, suggesting that alternative solutions to similar challenges may exist. Where the previous literature stops, however, is in detailed sourcing of the candidate adaptive alleles and their exact origins, which is here recognized as highly mosaic. Indeed, both theoretical and empirical evidence indicates that polyploids may have increased access to allelic variation than diploids [19,25,26,28]. In Arabidopsis tetraploids it is largely ancestral standing variation in the form of trans-specific polymorphism, segregating in diploids of multiple Arabidopsis species, that sources allelic variation in the PSGs. This supports the growing recognition of the role of trans-specific standing polymorphism in adaptation [53,54]. Introgression among tetraploids also substantially contributes to allelic variation in tetraploids, aligning with the observed signatures of gene flow among tetraploids of both species [20,38] and with the general recognition of introgression to generate potentially beneficial diversity within polyploid species complexes (reviewed in [16,18,55]). De novo mutations appear to have a subordinate role in tetraploid allelic variation, although they may nevertheless be crucial for rapid protein (co-)evolution during meiotic adaptation, as indicated in A. arenosa [7].

Overall, each haplotype corresponding to the PSGs comprised SNPs of varying ancestry, including trans-specific and species-specific polymorphisms from closer and more distant relatives, along with likely de novo mutations in the tetraploids. These ‘mosaic’ haplotypes were then likely shared via introgression among tetraploids and spread across Europe. Variable allelic sources were also found for the genomic background (i.e. subset of randomly selected synonymous SNPs), although trans-specific polymorphism, introgression, and de novo mutations in tetraploids were more frequent for the candidate SNPs. This finding is, however, biased by the fact that we sampled only candidate SNPs that repeatedly overlapped between A. lyrata and A. arenosa, as the PSGs constitute a subset of genes replicated across all tetraploid lineages. It is therefore expected that the proportion of introgression-indicative variants is higher in the PSGs compared to the genomic background. Most of the trans-specific polymorphism for both the candidate SNPs and the genomic background SNPs stem from A. arenosa and at least one outgroup species, and we hypothesize that this is due to the basal position of A. arenosa in a novel phylogeny of the genus Arabidopsis that we recently generated. This distinct phylogenetic placement of A. arenosa both enabled accumulation of private alleles and allele sharing with derived species. In addition, the high nucleotide diversity of diploid A. arenosa (on average twice of that in A. lyrata) allowed for retention of a large allelic pool to serve as raw material for subsequent rapid post-WGD selection.

Importantly, none of these tetraploid haplotypes were found in any diploid individuals, which indicates that they likely assembled upon the establishment of tetraploids. The rearrangement of diverse allelic sources into ‘mosaic’ tetraploid haplotypes may be facilitated by high recombination rates reported for these Arabidopsis species [56,57] and the observation of enhanced recombination rates in neo-tetraploid A. arenosa and A. thaliana [13,58]. Based on estimates of the age of the tetraploids of A. arenosa and A. lyrata, which range from approximately 20,000 to 230,000 generations [20,26,39], we speculate that the major tetraploid haplotype of each PSG rapidly spread across Europe. We provide hypotheses about the spatio-temporal context of the origin of these ‘mosaic’ haplotypes in S3 Text.

Altogether, we show that extensive reshuffling of trans-specific, species-specific, and novel variation occurred after WGD. This study demonstrates that polyploid lineages leverage their enhanced capacity to accumulate genetic variation from various sources to generate new, potentially advantageous haplotypes. Importantly, this process is not unique to polyploids, as diploids have also been reported to accumulate multiple adaptive changes into finely tuned haplotypes [34,36,59]. Recent research in both plants and animals reveals substantial variation in the contributions of standing, introgressed, and de novo alleles to adaptation at the entire gene level [3133,60]. The diversity of allelic sources in individual haplotypes in tetraploid Arabidopsis implies that pathways to adaptation may be even more diverse when considering the assembly of individual variants into haplotypes.

Material and methods

Sampling

Arabidopsis, a well-studied plant model genus, is primarily diploid, but autotetraploids have been discovered among the genetically diverse outcrossing species A. arenosa and A. lyrata in Central Europe. While A. arenosa diploids and tetraploids are widespread, A. lyrata primarily occupies a narrow ecological niche in Central Europe. Previous research suggested that introgression between A. arenosa and A. lyrata tetraploids facilitated the genetic differentiation of A. lyrata tetraploids from their diploid ancestors, allowing them to expand their niche [20,38].

Here, we sampled and sequenced genomes from both diploid and neighboring tetraploid populations across all known diploid-tetraploid lineages in Central European Arabidopsis. Our dataset includes genomes from total 73 newly sequenced diploid and tetraploid individuals of both species. Additionally, we incorporated data from 910 previously published whole genome sequences [20,26,31,32,50,53,6165], encompassing a total of 983 individuals and 129 populations (S1 and S2 Data). The ploidy level of each individual in our new sampling was confirmed using flow cytometry following [66].

Population genetic structure

Samples were whole genome resequenced (as detailed in S1 Data), and single nucleotide polymorphisms (SNPs) were called using a ploidy-aware approach following [26,31]. Arabidopsis arenosa and A. lyrata autotetraploids show random segregation of chromosomes with a tendency towards bivalent formation [5,20,35,39]. Bivalent formation during metaphase I of meiosis in established autotetraploids results from reduced crossover frequencies relative to neo-tetraploids or diploid relatives [3]. This results in allele frequency estimation comparable to diploids [67] and allowed to use a set of methods designed for diploids, which are also applicable to mixed-ploidy data; for a discussion see [68]. Specifically, we inferred relationships between populations using allele frequency covariance graphs implemented in TreeMix v.1.13 [40]. Arabidopsis arenosa was rooted with the diploid Pannonian population BDO and A. lyrata with the diploid Scandinavian population LOM. To obtain confidence in the reconstructed topology of A. arenosa and A. lyrata (Figs 1B and 1C), we performed a bootstrap analysis with 1000 bp blocks (matching the selection scan window size below) and 100 replicates. Further, we used Bayesian clustering in fastSTRUCTURE [42]. We randomly sampled two alleles per tetraploid individual using a custom script. This approach does not appear to bias clustering in autotetraploid samples based on [69]. Finally, we displayed genetic relatedness among individuals using principal component analysis (PCA) as implemented in the ‘adegenet’ package [70] in R. We calculated genome-wide four-fold degenerate (4d) within-population metrics (nucleotide diversity (π) and Tajima’s D [71]) using the python3 ‘ScanTools_ProtEvol’ pipeline [31].

To test for introgression, we used the D-statistics (ABBA-BABA statistics) as described in [43]. We assessed introgression between tetraploid A. arenosa (P2) and each A. lyrata tetraploid lineage (P3). For the non-admixed population (P1), we used the early diverging diploid Pannonian lineage of A. arenosa [72], population BDO. Allele frequencies were polarized using diploid A. halleri, population GUN from Austria.

Genome-wide scans for positive selection

To find candidate positively selected genes (PSGs) which likely contributed to adaptation to WGD in tetraploids, we employed two selection scan methods suitable for both diploids and autotetraploids following our best practices [68]. These methods are based on metrics estimated from population allele frequencies that are comparable across diploid and tetrasomic autotetraploid populations, in contrast to methods relying on assumptions on individual genotypes that are ploidy-specific [44,68].

First, between each of the tetraploid populations and the most proximal diploid population within each lineage, we calculated Fst [73] in 1 kb windows. We then used PicMin [44] to test whether there was evidence of repeated genetic differentiation among the tetraploid/diploid population pairs. PicMin uses order statistics to test whether population genetic summary statistics (in this case Fst) for orthologous genomic regions in different lineages exhibit a common shift towards extreme values in multiple lineages, indicative of repeated action of positive selection. PicMin was applied on windows that had data for at least three lineages (86,249 windows in total). A genome-wide false discovery rate correction was then performed with a significance threshold of q < 0.01. In cases of outlier signal spanning adjacent windows, the window with the lowest q-value and highest Fst was retained.

We also employed a ‘candidate SNP’ approach by calculating SNP-based Fst between each of the tetraploid populations and their most proximal diploid population within each lineage. A 1% outlier threshold was used to identify highly differentiated SNPs (‘candidate SNPs’ hereafter) across the genome. We determined the density of these candidate SNPs per gene using A. lyrata gene models [74]. Candidate genes were identified as the upper quartile with the highest density of outlier SNPs. Fisher’s exact test (‘SuperExactTest’ package [75] in R) was then applied to verify repeatedly identified candidates. Notably, all PicMin-identified genes were confirmed by this candidate SNP approach.

Functional annotation

To infer functions significantly enriched in our list of tetraploid PSGs, we performed gene ontology (GO) and UniProt Keywords enrichment analyses using the STRING database (last accessed 02/03/2023, [46]). We used Arabidopsis thaliana orthologs of A. lyrata genes. Only categories with FDR < 0.05 were considered. We also manually searched for functional descriptions of each gene using the TAIR database and associated literature (last accessed 02/03/2023, [47]). To identify protein-protein interactions, we used the STRING database, including all available information sources, and focused on 1st shell interactions.

Haplotype block reconstruction

Due to unreliable standard phasing algorithms for short-read tetraploid samples [76], we first used a SNP frequency-based approach to reconstruct ‘haplotype blocks’ (here defined as linked, highly differentiated SNPs, which were later in the study confirmed with long-read data, further referred to as ‘haplotypes’). We reconstructed the haplotypes for the 12 PSGs (a subset of 17 PSGs supported by at least five candidate highly differentiated SNPs) within A. lyrata and A. arenosa populations (232 candidate SNPs in total; S4 Data). We followed a procedure by [7]; briefly, haplotype block frequencies (HBFs) were calculated separately for diploids (218 A. arenosa and 121 A. lyrata individuals) and tetraploids (479 individuals). For each of the 12 PSGs with n candidate SNPs (for n see Fig 3B), we determined the allele frequency (AF) of the major allele across all individuals. With 1916 tetraploid haplotypes and 436 (A. arenosa) / 242 (A. lyrata) diploid haplotypes in our dataset, the major AF corresponds to the tetraploid allele. We defined the tetraploid haplotype block frequency as the minimum among n AFs (HBFt = min {major AF}), and the frequency of diploid haplotype blocks as HBFd = 1 – max {major AF}. The frequency of all other haplotype blocks resulting from recombination or de novo mutations was defined as HBFr = 1 – HBFt – HBFd. Calculations were performed using an in-house R script.

We next validated our haplotype block analysis by estimating the correspondence between our allele frequency-based inferred haplotype blocks and (1) sequences from PacBio HiFi-based long read assemblies of the 12 PSGs in five diploid and five tetraploid samples of A. arenosa (see S9 Data for metadata), and (2) sequences from long reads themselves spanning two PSGs in two diploid and two tetraploid A. arenosa samples. For (1), we used BLASTn v.2.10.0 to extract sequences from the 12 PSGs in newly generated diploid and tetraploid assemblies. These sequences were then aligned using MUSCLE and visualized with IGV v.2.11.9 (S5 Data). For (2), we aligned the long reads spanning the region of the PSGs CYCA2;3 and PDS5b to the assemblies, visualized with IGV, and manually exported the information about physical linkage of candidate SNPs into (S7 Data). As the PacBio HiFi reads from five diploid and five tetraploid individuals of A. arenosa confirmed the linkage across candidate SNPs, we call them haplotypes. This, however, does not mean that all tetraploids share the very same DNA molecule between the first and last SNP of a haplotype block, which would strictly mean haplotype, as there are lots of rare SNPs in between the linked candidate SNPs.

Allelic sources of tetraploid haplotypes across the candidate positively selected genes

To assess if the tetraploid haplotypes originated from a single source or as a mosaic of allelic sources, we compiled a collection of genomes from 983 individuals: 818 individuals from 46 diploid and 61 tetraploid populations of A. lyrata and A. arenosa, and 165 individuals from four congeners. We employed two methods: tracing the evolutionary origins of candidate SNPs across the ancestral diploid lineages [50], and reconstructing networks of genetic distances at the PSG regions [70].

To determine the allelic sources contributing to tetraploid haplotypes, we investigated the presence/absence of the variants defining them (the 232 ‘candidate SNPs’) within diploid individuals. We used a comprehensive genomic dataset encompassing diploid A. lyrata and A. arenosa (339 diploid individuals) alongside their outcrossing relatives A. halleri, A. croatica, A. cebennensis, and A. pedemontana (‘diploid congeners’ or ‘outgroup’, totaling 165 individuals; see S1 Data). A rarefaction analysis in [7] indicated that sampling as few as 40 diploid individuals across the A. arenosa species range captures the majority of diploid diversity. Therefore, our dataset of 504 diploid individuals should suffice to cover the full natural diversity of diploids. We excluded singletons (variants occurring only once in a species) to reduce the impact of sequencing errors (results were robust with or without singletons; S6 Fig). Then, using species’ phylogenetic relationships ([51], Fig 4A), we determined likely allelic sources for the 232 candidate SNPs among 504 diploid individuals. We categorized each SNP according to one of the following seven scenarios (Fig 4A): 1) trans-specific polymorphism in both A. lyrata and A. arenosa; 2) trans-specific polymorphism in A. arenosa only (tetraploid SNP occurs in A. arenosa but not A. lyrata diploids and in one to all diploid congeners); 3) trans-specific polymorphism in A. lyrata only (tetraploid SNP occurs in A. lyrata but not A. arenosa diploids and in one to all diploid congeners); 4) trans-specific polymorphism in the congeners only; 5) A. arenosa-specific polymorphism; 6) A. lyrata-specific polymorphism; 7) tetraploid-private polymorphism. Here, we differentiate between trans-specific and species-specific variability at the diploid level, reflecting ancestral allele sharing rather than recent introgression. It is crucial to note that any tetraploid SNP identified here ultimately represents trans-specific polymorphism since it is shared between tetraploids of both A. arenosa and A. lyrata.

As additional evidence to estimate variability in allelic sources of tetraploid haplotypes, we constructed genetic distance networks using Nei’s distance [77] in the ‘adegenet’ package in R [70]. We visualized these networks using SplitsTree [41].

Allelic sources of the genomic background

To test if the allelic sources of the 232 candidate SNPs for adaptation to WGD differed from the genomic background, we estimated the relative contribution of the same allelic sources, in particular the seven scenarios outlined above, to synonymous variation from the genome-wide background. The following steps were undertaken to make this analysis comparable with the analysis of the 232 candidate SNPs: All variable sites within the core dataset of 155 Central European A. arenosa and A. lyrata (depth ≥ 4, missing data ≤ 60%) were extracted. From these total 5,092,575 SNPs, a BED file of synonymous SNPs was generated, resulting in total 1,341,033 SNPs. Then, SNPs were extracted from this BED file, applying a filter for missing data ≤ 20%, i) from a dataset comprising the 871 individuals of A. arenosa and A. lyrata only (1,198,065 SNPs), and ii) from a dataset of 173 individuals from the outgroup (339,761 SNPs). The obtained SNPs from both datasets were overlaid, sites which were invariant across all tetraploids removed, and a random subset of 232 SNPs across the genome was drawn 1000 times. To prevent overestimation of the proportion of trans-specific polymorphism, SNPs missing in the outgroup dataset were treated as invariant. The SNP data subsets were polarized towards tetraploids of A. arenosa and A. lyrata. Allelic source scenarios for each of the 232 SNPs were inferred in the same way as for the 232 candidate SNPs for adaptation to WGD.

Supporting information

S1 Fig. Population structure of A. arenosa (A) and A. lyrata (A-D).

A: PCA shows separation between A. arenosa and A. lyrata over 50,000 of scaffold_1 SNPs. (B-D): PCA, Nei’s distance-based neighbor-joining tree, and fastSTRUCTURE plot all support three lineages of tetraploid A. lyrata, named AL Germany, AL Czechia, and AL Austria. For these analyses we used 1,094,553 genome-wide four-fold degenerate SNPs. Red: diploid populations, blue: tetraploid populations.

(EPS)

pgen.1011521.s001.eps (184.6KB, eps)
S2 Fig. Location of candidate genes (black vertical lines) on A. lyrata reference chromosomes colored by bins of distinct recombination rate per gene.

The figure indicates that gene candidates do not cluster in regions with extreme values of recombination rate per gene (blue), as estimated based on the available A. lyrata recombination map [56].

(EPS)

pgen.1011521.s002.eps (58.1KB, eps)
S3 Fig. Candidate positively selected genes (PSGs) involved in processes of cell cycle regulation, meiosis, and transcriptional regulation vary in their proportion of differentiated cis-regulatory, nonsynonymous, and synonymous SNPs.

The number above each bar corresponds to the number of differentiated SNPs within each gene. ~5M of genic and cis-regulatory SNPs from whole genome sequencing data were taken as controls.

(EPS)

pgen.1011521.s003.eps (51.6KB, eps)
S4 Fig. Phenotypic shift associated with the establishment of tetraploids of A. arenosa and A. lyrata in the form of a decrease in the level of DNA endoreduplication.

Top: Proportion of endoreduplicated nuclei in leaves of 10 individuals per each lineage and ploidy. **: p < 0.01, ****: p < 0.0001, ns: nonsignificant, Wilcoxon rank sum test. The horizontal violet line shows the level of endoreduplication under a scenario of no post-WGD adaptation, estimated from values of synthetic neo-tetraploids of A. arenosa. Bottom: Prediction for the compensatory evolution of polyploid traits from [2] and observed average DNA content per leaf cell (calculated as a mean number of homologous chromosomes per nucleus) for diploid, neo-tetraploid, and tetraploid A. arenosa.

(EPS)

pgen.1011521.s004.eps (58.2KB, eps)
S5 Fig. Comparison of leaf endoreduplication among diploids.

Upper boxplots show the endoreduplication level, calculated as the number of endoreduplicated nuclei divided by all nuclei in the analysis, bottom plots show the maximum number of endoreduplication cycles in the leaf per each diploid lineage. ****: p < 0.0001, ***: p < 0.001, ns: nonsignificant, Wilcoxon rank sum test. Each boxplot is represented by 10 individuals.

(EPS)

pgen.1011521.s005.eps (204.9KB, eps)
S6 Fig. Variable allelic sources of tetraploid haplotypes for each of the 12 tetraploid positively selected genes (PSGs).

Barplots show the proportion of candidate SNPs representing each of the seven source scenarios (see Fig 4A for graphical visualization of scenarios). Upper plot shows the source when including singletons, bottom plot after filtering singletons out.

(EPS)

pgen.1011521.s006.eps (76.6KB, eps)
S7 Fig. Nei’s distance-based neighbor-joining networks of tetraploid positively selected genes (PSGs) across diploid Arabidopsis (A. arenosa, A. lyrata, A. croatica, A. cebennensis, and A. pedemontana) and tetraploid Arabidopsis (A. arenosa, A. lyrata).

Tetraploids from all four tetraploid lineages of A. arenosa and A. lyrata (blue) always form a single lineage, suggesting the presence of a single shared tetraploid haplotype at the locus. Further, they form an unresolved network with diploids of multiple species (red), suggesting diversity of allelic sources of each shared tetraploid haplotype (‘mosaic scenario’).

(EPS)

pgen.1011521.s007.eps (155.4KB, eps)
S1 Table. Genome-wide nucleotide diversity and Tajima’s D of A. lyrata populations newly sequenced here, calculated over four-fold degenerate sites.

(DOCX)

pgen.1011521.s008.docx (6.5KB, docx)
S2 Table. Summary of the 17 candidate tetraploid positively selected genes (PSGs).

(DOCX)

pgen.1011521.s009.docx (8.3KB, docx)
S3 Table. Functional annotation of the 17 tetraploid positively selected genes (PSGs).

(DOCX)

pgen.1011521.s010.docx (9.9KB, docx)
S4 Table. Presence and frequency of tetraploid, two diploid, and other haplotype blocks in all 61 tetraploid populations of A. arenosa and A. lyrata (479 individuals).

AF: allele frequency of haplotype.

(DOCX)

pgen.1011521.s011.docx (6.5KB, docx)
S5 Table. Variable allelic sources of tetraploid haplotypes for 1000-times randomly sampled 232 synonymous SNPs representing the genomic background versus the 232 candidate SNPs of the 12 positively selected genes (PSGs).

The proportion of SNPs representing each of the seven source scenarios is given (see Fig 4A for graphical visualization of scenarios).

(DOCX)

pgen.1011521.s012.docx (6.3KB, docx)
S1 Data. Metadata and sequence quality checks for the 983 whole genome sequenced individuals.

(XLSX)

pgen.1011521.s013.xlsx (163.2KB, xlsx)
S2 Data. Sampling locations of the 129 populations.

(XLSX)

pgen.1011521.s014.xlsx (13.7KB, xlsx)
S3 Data. Set of 54 significant PicMin windows and genome-wide PicMin results underlying the PicMin Manhattan plot.

(XLSX)

pgen.1011521.s015.xlsx (603.8KB, xlsx)
S4 Data. List of outlier genes identified using the ‘candidate SNP’ approach and their overlap among tetraploid lineages.

Candidate SNPs were used in the selection scan to identify positively selected genes (PSGs), following a three-step procedure. First, 1% outlier SNPs were identified; second, the top quartile of genes with the highest density of outlier SNPs were identified; and third, these genes were overlapped among the four tetraploid lineages of A. arenosa and A. lyrata to identify repeatedly differentiated genes. The PSGs and their overlap are shown here.

(XLSX)

pgen.1011521.s016.xlsx (9.1KB, xlsx)
S5 Data. Sequences of the 12 positively selected genes (PSGs), assembled using long read sequencing of five diploid and five tetraploid individuals.

(XLSX)

pgen.1011521.s017.xlsx (76.8KB, xlsx)
S6 Data. Table of linked candidate SNPs, as determined using long read sequencing.

Position of variants within the same column (columns D-N) highlights their physical linkage within a gene region (black boxes). Note that these linked candidate SNPs were used as markers to reconstruct haplotypes from short read data, which is summarized in columns Q-AMJ.

(XLSX)

pgen.1011521.s018.xlsx (806.3KB, xlsx)
S7 Data. Candidate SNPs marking haplotypes of CYCA2;3 and PDS5b (Figs 4C and 4D), as found on the same long read.

(XLSX)

pgen.1011521.s019.xlsx (36.7KB, xlsx)
S8 Data. Distribution of the 232 tetraploid candidate SNPs among the 504 diploid samples used to estimate the likely sources of tetraploid haplotypes.

(XLSX)

pgen.1011521.s020.xlsx (20.6KB, xlsx)
S9 Data. Metadata and sequence quality checks for the 10 long read-sequenced individuals using PacBio HiFi.

(XLSX)

pgen.1011521.s021.xlsx (6.7KB, xlsx)
S1 Text. Detailed functional interpretations of the candidate positively selected genes (PSGs).

(DOCX)

pgen.1011521.s022.docx (8.2KB, docx)
S2 Text. Detailed methods and results of estimating DNA endoreduplication using flow cytometry.

(DOCX)

S3 Text. Hypotheses about the spatio-temporal context of the origin of the ‘mosaic’ haplotypes.

(DOCX)

pgen.1011521.s024.docx (8.1KB, docx)

Acknowledgments

We greatly appreciate the constructive feedback from members of the Ecolgen team in Prague, Katie Peichel, Reto Burri, Alison Scott, Polina Novikova, Andrew MacColl, Tuomas Hämälä, John Brookfield, Emma Curran, Laura Dean, Sian Bray, and Ana da Silvia. We further appreciate inspiration provided by the ForBio Polyploid course in Drøbak, Norway. We thank Martin Čertner and Dorka Čertnerová for sharing their polyploid synthesis protocol, Bodo Schwarzberg (Untere Naturschutzbehörde Nordhausen) for sharing seeds of the STD population, and Bertram Preuschhof (Untere Naturschutzbehörde Göttingen) for a collecting permit for the SCT population. Sequencing was performed by the Norwegian Sequencing Centre, University of Oslo. Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085, under the program Projects of Large Research, Development, and Innovations Infrastructures.

Data Availability

Sequence data that support the findings of this study are deposited in the NCBI (https://www.ncbi.nlm.nih.gov/bioproject/) under BioProjects PRJNA284572, PRJNA309929, PRJNA357693, PRJNA357372, PRJNA459481, PRJNA493227, PRJEB34247 (ENA), PRJNA506705, PRJNA484107, PRJNA592307, PRJNA667586, PRJNA929698. See S1 Data for individual codes. ScanTools_ProtEvol pipeline: github.com/mbohutinska/ScanTools_ProtEvol ABBA-BABA pipeline: github.com/simonhmartin/tutorials/tree/master/ABBA_BABA_whole_genome PicMin: github.com/TBooker/PicMin Allele frequencies of haplotype blocks: github.com/mbohutinska/repeatedWGD, section ‘Haplotype AF’.

Funding Statement

This work was supported by the Czech Science Foundation (project 20-22783S to F.K., project 19-06632S to K.M.), Leverhulme Trust award (no. RPG-2020-367 to L.Y.), PRIMUS Research Programme of Charles University (PRIMUS/17/SCI/23 to R.S.), European Union’s research and innovation programme under the Marie Skłodowska-Curie (project 101062703 to M.B.), European Research Council (project 850852 DOUBLEADAPT to F.K.), Charles University Grant Agency (no. 219223 to A.P.), and long-term research development project no. RVO 67985939 of the Czech Academy of Sciences. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Doyle JJ, Coate JE. Polyploidy, the nucleotype, and novelty: the impact of genome doubling on the biology of the cell. Int J Plant Sci. 2019;180(1): 1–52. [Google Scholar]
  • 2.Bomblies K. When everything changes at once: finding a new normal after genome duplication. Proc R Soc Lond B Biol Sci. 2020;287(1939): 20202154. doi: 10.1098/rspb.2020.2154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bomblies K. Learning to tango with four (or more): the molecular basis of adaptation to polyploid meiosis. Plant Reprod. 2023;36(1): 107–24. doi: 10.1007/s00497-022-00448-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lv Z, Addo Nyarko C, Ramtekey V, Behn H, Mason AS. Defining autopolyploidy: cytology, genetics, and taxonomy. Am J Bot. 2024;111(8): e16292. doi: 10.1002/ajb2.16292 [DOI] [PubMed] [Google Scholar]
  • 5.Yant L, Hollister JD, Wright KM, Arnold BJ, Higgins JD, Franklin FCH, et al. Meiotic adaptation to genome duplication in Arabidopsis arenosa. Curr Biol. 2013;23(21): 2151–6. doi: 10.1016/j.cub.2013.08.059 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bray SM, Hämälä T, Zhou M, Busoms S, Fischer S, Desjardins SD, et al. Kinetochore and ionomic adaptation to whole-genome duplication in Cochlearia shows evolutionary convergence in three autopolyploids. Cell Rep. 2024;43(8): 114576. doi: 10.1016/j.celrep.2024.114576 [DOI] [PubMed] [Google Scholar]
  • 7.Bohutínská M, Handrick V, Yant L, Schmickl R, Kolář F, Bomblies K, et al. De novo mutation and rapid protein (co-)evolution during meiotic adaptation in Arabidopsis arenosa. Mol Biol Evol. 2021;38(5): 1980–94. doi: 10.1093/molbev/msab001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bohutínská M, Alston M, Monnahan P, Mandáková T, Bray S, Paajanen P, et al. Novelty and convergence in adaptation to whole genome duplication. Mol Biol Evol. 2021;38(9): 3910–24. doi: 10.1093/molbev/msab096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Imai KK, Ohashi Y, Tsuge T, Yoshizumi T, Matsui M, Oka A, et al. The A-type cyclin CYCA2;3 is a key regulator of ploidy levels in Arabidopsis endoreduplication. Plant Cell. 2006;18(2): 382–96. doi: 10.1105/tpc.105.037309 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sterken R, Kiekens R, Boruc J, Zhang F, Vercauteren A, Vercauteren I, et al. Combined linkage and association mapping reveals CYCD5;1 as a quantitative trait gene for endoreduplication in Arabidopsis. Proc Natl Acad Sci U S A. 2012;109(12): 4678–83. doi: 10.1073/pnas.1120811109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Potapova TA, Seidel CW, Box AC, Rancati G, Li R. Transcriptome analysis of tetraploid cells identifies cyclin D2 as a facilitator of adaptation to genome doubling in the presence of p53. Mol Biol Cell. 2016;27(20): 3065–84. doi: 10.1091/mbc.E16-05-0268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Crockford A, Zalmas LP, Grönroos E, Dewhurst SM, McGranahan N, Cuomo ME, et al. Cyclin D mediates tolerance of genome-doubling in cancers with functional p53. Ann Oncol. 2017;28(1): 149–56. doi: 10.1093/annonc/mdw612 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Morgan C, Zhang H, Henry CE, Franklin FCH, Bomblies K. Derived alleles of two axis proteins affect meiotic traits in autotetraploid Arabidopsis arenosa. Proc Natl Acad Sci U S A. 2020;117(16): 8980–88. doi: 10.1073/pnas.1919459117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Morgan C, Knight E, Bomblies K. The meiotic cohesin subunit REC8 contributes to multigenic adaptive evolution of autopolyploid meiosis in Arabidopsis arenosa. PLoS Genet. 2022;18(7): e1010304. doi: 10.1371/journal.pgen.1010304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Westermann J, Srikant T, Gonzalo A, Tan HS, Bomblies K. Defective pollen tube tip growth induces neo-polyploid infertility. Science. 2024;383(6686): eadh0755. doi: 10.1126/science.adh0755 [DOI] [PubMed] [Google Scholar]
  • 16.Marhold K, Lihová J. Polyploidy, hybridization and reticulate evolution: lessons from the Brassicaceae. Pl Syst Evol. 2006;259: 143–74. [Google Scholar]
  • 17.Lafon-Placette C, Johannessen IM, Hornslien KS, Ali MF, Bjerkan KN, Bramsiepe J, et al. Endosperm-based hybridization barriers explain the pattern of gene flow between Arabidopsis lyrata and Arabidopsis arenosa in Central Europe. Proc Natl Acad Sci U S A. 2017;114(6): E1027–35. doi: 10.1073/pnas.1615123114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Schmickl R, Yant L. Adaptive introgression: how polyploidy reshapes gene flow landscapes. New Phytol. 2021;230(2): 457–61. doi: 10.1111/nph.17204 [DOI] [PubMed] [Google Scholar]
  • 19.Arnold BJ, Lahner B, DaCosta JM, Weisman CM, Hollister JD, Salt DE, et al. Borrowed alleles and convergence in serpentine adaptation. Proc Natl Acad Sci U S A. 2016;113(29): 8320–25. doi: 10.1073/pnas.1600405113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Marburger S, Monnahan P, Seear PJ, Martin SH, Koch J, Paajanen P, et al. Interspecific introgression mediates adaptation to whole genome duplication. Nat Commun. 2019;10(1): 5218. doi: 10.1038/s41467-019-13159-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Slotte T, Huang H, Lascoux M, Ceplitis A. Polyploid speciation did not confer instant reproductive isolation in Capsella (Brassicaceae). Mol Biol Evol. 2008;25(7): 1472–81. doi: 10.1093/molbev/msn092 [DOI] [PubMed] [Google Scholar]
  • 22.Te Beest M, Le Roux JJ, Richardson DM, Brysting AK, Suda J, Kubešová M, et al. The more the better? The role of polyploidy in facilitating plant invasions. Ann Bot. 2012;109(1): 19–45. doi: 10.1093/aob/mcr277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Baduel P, Bray S, Vallejo-Marin M, Kolář F, Yant L. The ‘Polyploid Hop’: shifting challenges and opportunities over the evolutionary lifespan of genome duplications. Front Ecol Evol. 2018;6: 117. [Google Scholar]
  • 24.Morgan EJ, Čertner M, Lučanová M, Deniz U, Kubíková K, Venon A, et al. Disentangling the components of triploid block and its fitness consequences in natural diploid-tetraploid contact zones of Arabidopsis arenosa. New Phytol. 2021;232(3): 1449–62. [DOI] [PubMed] [Google Scholar]
  • 25.Ronfort J. The mutation load under tetrasomic inheritance and its consequences for the evolution of the selfing rate in autotetraploid species. Genet Res (Camb). 1999;74(1): 31–42. [Google Scholar]
  • 26.Monnahan P, Kolář F, Baduel P, Sailer C, Koch J, Horvath R, et al. Pervasive population genomic consequences of genome duplication in Arabidopsis arenosa. Nat Ecol Evol. 2019;3(3): 457–68. doi: 10.1038/s41559-019-0807-4 [DOI] [PubMed] [Google Scholar]
  • 27.Hämälä T, Moore C, Cowan L, Carlile M, Gopaulchan D, Brandrud MK, et al. Impact of whole-genome duplications on structural variant evolution in Cochlearia. Nat Commun. 2024;15: 5377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Selmecki AM, Maruvka YE, Richmond PA, Guillet M, Shoresh N, Sorenson AL, et al. Polyploidy can drive rapid adaptation in yeast. Nature. 2015;519(7543): 349–52. doi: 10.1038/nature14187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Oziolor EM, Reid NM, Yair S, Lee KM, Guberman VerPloeg S, Bruns PC, et al. Adaptive introgression enables evolutionary rescue from extreme environmental pollution. Science. 2019;364(6439): 455–57. doi: 10.1126/science.aav4155 [DOI] [PubMed] [Google Scholar]
  • 30.Lee KM, Coop G. Population genomics perspectives on convergent adaptation. Philos Trans R Soc Lond B Biol Sci. 2019;374: 20180236. doi: 10.1098/rstb.2018.0236 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bohutínská M, Vlček J, Yair S, Laenen B, Konečná V, Fracassetti M, et al. Genomic basis of parallel adaptation varies with divergence in Arabidopsis and its relatives. Proc Natl Acad Sci U S A. 2021;118(21): e2022713118. doi: 10.1073/pnas.2022713118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Konečná V, Bray S, Vlček J, Bohutínská M, Požárová D, Roy Choudhury R, et al. Parallel adaptation in autopolyploid Arabidopsis arenosa is dominated by repeated recruitment of shared alleles. Nat Commun. 2021;12(4979): 1–13. doi: 10.1038/s41467-021-25256-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wang L, Josephs EB, Lee KM, Roberts LM, Rellán-Álvarez R, Ross-Ibarra J, et al. Molecular parallelism underlies convergent highland adaptation of maize landraces. Mol Biol Evol. 2021;38(9): 3567–80. doi: 10.1093/molbev/msab119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Archambeault SL, Bärtschi LR, Merminod AD, Peichel CL. Adaptation via pleiotropy and linkage: association mapping reveals a complex genetic architecture within the stickleback Eda locus. Evol Lett. 2020;4(4): 282–301. doi: 10.1002/evl3.175 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Seear PJ, France MG, Gregory CL, Heavens D, Schmickl R, Yant L, et al. A novel allele of ASY3 is associated with greater meiotic stability in autotetraploid Arabidopsis lyrata. PLoS Genet. 2020;16(7): e1008900. doi: 10.1371/journal.pgen.1008900 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Roberts Kingman GA, Vyas DN, Jones FC, Brady SD, Chen HI, Reid K, et al. Predicting future from past: the genomic basis of recurrent and rapid stickleback evolution. Sci Adv. 2021;7(25): 5285–303. doi: 10.1126/sciadv.abg5285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Yant L, Bomblies K. Genomic studies of adaptive evolution in outcrossing Arabidopsis species. Curr Opin Plant Biol. 2017;36: 9–14. doi: 10.1016/j.pbi.2016.11.018 [DOI] [PubMed] [Google Scholar]
  • 38.Schmickl R, Koch MA. Arabidopsis hybrid speciation processes. Proc Natl Acad Sci U S A. 2011;108(34): 14192–97. doi: 10.1073/pnas.1104212108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Arnold B, Kim S-T, Bomblies K. Single geographic origin of a widespread autotetraploid Arabidopsis arenosa lineage followed by interploidy admixture. Mol Biol Evol. 2015;32(6): 1382–95. doi: 10.1093/molbev/msv089 [DOI] [PubMed] [Google Scholar]
  • 40.Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8(11): e1002967. doi: 10.1371/journal.pgen.1002967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Huson DH, Bryant D. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol. 2006;23(2): 254–67. doi: 10.1093/molbev/msj030 [DOI] [PubMed] [Google Scholar]
  • 42.Raj A, Stephens M, Pritchard JK. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics. 2014;197(2): 573–89. doi: 10.1534/genetics.114.164350 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Martin SH, Dasmahapatra KK, Nadeau NJ, Salazar C, Walters JR, Simpson F, et al. Genome-wide evidence for speciation with gene flow in Heliconius butterflies. Genome Res. 2013;23(11): 1817–28. doi: 10.1101/gr.159426.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Booker TR, Yeaman S, Whitlock MC. Using genome scans to identify genes used repeatedly for adaptation. Evolution. 2023;77(3): 801–11. doi: 10.1093/evolut/qpac063 [DOI] [PubMed] [Google Scholar]
  • 45.Burri R. Interpreting differentiation landscapes in the light of long-term linked selection. Evol Lett. 2017;1(3): 118–31. [Google Scholar]
  • 46.Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(D1): D447–52. doi: 10.1093/nar/gku1003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Berardini TZ, Reiser L, Li D, Mezheritsky Y, Muller R, Strait E, et al. The Arabidopsis Information Resource: making and mining the ‘gold standard’ annotated reference plant genome. Genesis. 2015;53(8): 474–85. doi: 10.1002/dvg.22877 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.De Rocher EJ, Harkins KR, Galbraith DW, Bohnert HJ. Developmentally regulated systemic endopolyploid in succulents with small genomes. Science. 1990;250(4977): 99–101. doi: 10.1126/science.250.4977.99 [DOI] [PubMed] [Google Scholar]
  • 49.Barow M. Endopolyploidy in seed plants. BioEssays. 2006;28(3): 271–81. doi: 10.1002/bies.20371 [DOI] [PubMed] [Google Scholar]
  • 50.Novikova PY, Hohmann N, Nizhynska V, Tsuchimatsu T, Ali J, Muir G, et al. Sequencing of the genus Arabidopsis identifies a complex history of nonbifurcating speciation and abundant trans-specific polymorphism. Nat Genet. 2016;48(9): 1077–82. doi: 10.1038/ng.3617 [DOI] [PubMed] [Google Scholar]
  • 51.Novikova PY, Hohmann N, Van de Peer Y. Polyploid Arabidopsis species originated around recent glaciation maxima. Curr Opin Plant Biol. 2018;42: 8–15. doi: 10.1016/j.pbi.2018.01.005 [DOI] [PubMed] [Google Scholar]
  • 52.Ansell SW, Stenøien HK, Grundmann M, Schneider H, Hemp A, Bauer N, et al. Population structure and historical biogeography of European Arabidopsis lyrata. Heredity. 2010;105(6): 543–53. doi: 10.1038/hdy.2010.10 [DOI] [PubMed] [Google Scholar]
  • 53.Guggisberg A, Liu X, Suter L, Mansion G, Fischer MC, Fior S, et al. The genomic basis of adaptation to calcareous and siliceous soils in Arabidopsis lyrata. Mol Ecol. 2018;27(24): 5088–103. doi: 10.1111/mec.14930 [DOI] [PubMed] [Google Scholar]
  • 54.Marques DA, Meier JI, Seehausen O. A combinatorial view on speciation and adaptive radiation. Trends Ecol Evol. 2019;34(6): 531–44. doi: 10.1016/j.tree.2019.02.008 [DOI] [PubMed] [Google Scholar]
  • 55.Leal JL, Milesi P, Hodková E, Zhou Q, James J, Eklund DM, et al. Complex polyploids: origins, genomic composition, and role of introgressed alleles. Syst Biol. 2024;syae012: doi: 10.1093/sysbio/syae012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Hämälä T, Savolainen O. Genomic patterns of local adaptation under gene flow in Arabidopsis lyrata. Mol Biol Evol. 2019;36(11): 2557–71. doi: 10.1093/molbev/msz149 [DOI] [PubMed] [Google Scholar]
  • 57.Dukić M, Bomblies K. Male and female recombination landscapes of diploid Arabidopsis arenosa. Genetics. 2022;220(3): iyab236. doi: 10.1093/genetics/iyab236 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Pecinka A, Fang W, Rehmsmeier M, Levy AA, Mittelsten Scheid O. Polyploidization increases meiotic recombination frequency in Arabidopsis. BMC Biol. 2011;9: 24. doi: 10.1186/1741-7007-9-24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.McGregor AP, Orgogozo V, Delon I, Zanet J, Srinivasan DG, Payre F, et al. Morphological evolution through multiple cis-regulatory mutations at a single gene. Nature. 2007;448(7153): 587–90. doi: 10.1038/nature05988 [DOI] [PubMed] [Google Scholar]
  • 60.Moran RL, Richards EJ, Ornelas-García CP, Gross JB, Donny A, Wiese J, et al. Selection-driven trait loss in independently evolved cavefish populations. Nat Commun. 2023;14: 2557. doi: 10.1038/s41467-023-37909-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Novikova PY, Tsuchimatsu T, Simon S, Nizhynska V, Voronin V, Burns R, et al. Genome sequencing reveals the origin of the allotetraploid Arabidopsis suecica. Mol Biol Evol. 2017;34(4): 957–68. doi: 10.1093/molbev/msw299 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Hämälä T, Mattila TM, Leinonen PH, Kuittinen H, Savolainen O. Role of seed germination in adaptation and reproductive isolation in Arabidopsis lyrata. Mol Ecol. 2017;26(13): 3484–96. doi: 10.1111/mec.14135 [DOI] [PubMed] [Google Scholar]
  • 63.Mattila TM, Tyrmi J, Pyhäjärvi T, Savolainen O. Genome-wide analysis of colonization history and concomitant selection in Arabidopsis lyrata. Mol Biol Evol. 2017;34(10): 2665–77. doi: 10.1093/molbev/msx193 [DOI] [PubMed] [Google Scholar]
  • 64.Hämälä T, Mattila TM, Savolainen O. Local adaptation and ecological differentiation under selection, migration, and drift in Arabidopsis lyrata. Evolution. 2018;72(7): 1373–86. doi: 10.1111/evo.13502 [DOI] [PubMed] [Google Scholar]
  • 65.Preite V, Sailer C, Syllwasschy L, Bray S, Ahmadi H, Krämer U, et al. Convergent evolution in Arabidopsis halleri and Arabidopsis arenosa on calamine metalliferous soils. Philos Trans R Soc Lond B Biol Sci. 2019;374(1777): 20180243. doi: 10.1098/rstb.2018.0243 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Kolář F, Lučanová M, Záveská E, Fuxová G, Mandáková T, Španiel S, et al. Ecological segregation does not drive the intricate parapatric distribution of diploid and tetraploid cytotypes of the Arabidopsis arenosa group (Brassicaceae). Biol J Linn Soc. 2016;119(3): 673–88. [Google Scholar]
  • 67.Meirmans PG, Liu S, van Tienderen PH. The analysis of polyploid genetic data. J Hered. 2018;109(3): 283–96. doi: 10.1093/jhered/esy006 [DOI] [PubMed] [Google Scholar]
  • 68.Bohutínská M, Vlček J, Monnahan P, Kolář F. Population genomic analysis of diploid-autopolyploid species. Methods Mol Biol. 2023;2545: 297–324. doi: 10.1007/978-1-0716-2561-3_16 [DOI] [PubMed] [Google Scholar]
  • 69.Stift M, Kolář F, Meirmans PG. STRUCTURE is more robust than other clustering methods in simulated mixed-ploidy populations. Heredity. 2019;123(4): 429–41. doi: 10.1038/s41437-019-0247-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Jombart T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24(11): 1403–05. doi: 10.1093/bioinformatics/btn129 [DOI] [PubMed] [Google Scholar]
  • 71.Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123(3): 585–95. doi: 10.1093/genetics/123.3.585 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Kolář F, Fuxová G, Záveská E, Nagano AJ, Hyklová L, Lučanová M, et al. Northern glacial refugia and altitudinal niche divergence shape genome-wide differentiation in the emerging plant model Arabidopsis arenosa. Mol Ecol. 2016;25(16): 3929–49. doi: 10.1111/mec.13721 [DOI] [PubMed] [Google Scholar]
  • 73.Hudson RR, Slatkin M, Maddison WP. Estimation of levels of gene flow from DNA sequence data. Genetics. 1992;132(2): 583–9. doi: 10.1093/genetics/132.2.583 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Rawat V, Abdelsamad A, Pietzenuk B, Seymour DK, Koenig D, Weigel D, et al. Improving the annotation of Arabidopsis lyrata using RNA-Seq data. PLoS One. 2015;10(9): e0137391. doi: 10.1371/journal.pone.0137391 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Wang M, Zhao Y, Zhang B. Efficient test and visualization of multi-set intersections. Sci Rep. 2015;5: 16923. doi: 10.1038/srep16923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Kyriakidou M, Tai HH, Anglin NL, Ellis D, Strömvik MV. Current strategies of polyploid plant genome sequence assembly. Front Plant Sci. 2018;9: 1660. doi: 10.3389/fpls.2018.01660 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Nei M. Genetic distance between populations. Am Nat. 1972;106(949): 283–92. [Google Scholar]

Decision Letter 0

Justin C Fay, Yalong Guo

12 Sep 2024

Dear Dr Schmickl,

Thank you very much for submitting your Research Article entitled 'Polyploids broadly generate novel haplotypes from trans-specific variation in Arabidopsis arenosa and Arabidopsis lyrata' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some concerns that we ask you address in a revised manuscript.

We therefore ask you to modify the manuscript according to the review recommendations. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, log into your Editorial Manager account and select the option 'Revise Submission' in the 'Submissions Needing Revision' folder.

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Yalong Guo, Ph.D.

Guest Editor

PLOS Genetics

Justin Fay

Section Editor

PLOS Genetics

The in-depth review of your manuscript by the three reviewers is now complete. Based on their assessment, it is clear that your manuscript requires some revision before it can be considered further for publication in PLOS Genetics. Comments from the external reviewers are included below.

In particular, especially please pay attention to the following issues:

1) As reviewer1 pointed out "The aspect that may seem a bit surprising to some readers and that the authors do not stress is the fact that most genome scans studies comparing diploids and tetraploids end up finding roughly the same set of genes. Those are few and related to the same general function." It is important to discuss this "Are those the only changes or are we missing the rest?".

2) As reviewer2 pointed out that the importance of endopolyploidy in this study should be rephrased somehow.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This is an interesting and carefully carried out study. The paper is well written but, perhaps misses what seems, to this reader at least, one of the main conclusions/caveats. Many of the authors of the present manuscript have been working for a rather long time on tetraploid Arabidopsis species and have made important contributions, in particular to our understanding of the importance of genes related to meiosis during the shift from diploidy to tetraploidy. This paper is an additional contribution to this line of work. In previous studies, A. arenosa and A. lyrata were studied separately. Building on a massive dataset the authors study them jointly and highlight the importance of introgression, from other species, in the evolution of genes differentiating the two ploidy levels. Their work confirms that polyploids can act as "bridges" where hybridization between diploids is impossible. As they point out in the discussion, this is not an isolate case and there have recently been other studies pointing out the importance of introgression to generate potentially beneficial diversity within polyploid species complexes. The aspect that may seem a bit surprising to some readers and that the authors do not stress is the fact that most genome scans studies comparing diploids and tetraploids end up finding roughly the same set of genes. Those are few and related to the same general function. Now, not so recently the prevailing views in the literature on the effect of Whole genome doubling were more in line with a "shock and awe" event than subtle and limited changes. Many recent studies have been more in line with the latter than with the former. It could be that this is simply a consequence of the methods used to compare diploids and polyploids, but in any case, I feel it would be worth pointing this out in the discussion. Are those the only changes or are we missing the rest? Although I would tend to think the latter holds, in my understanding the jury is still out.

Minor comments:

1. line 71: removal or weakening of hybridization barriers?

2. line 74. In the pre-genomic era Slotte et al. MBE 2008 was a rather convincing example of introgression from diploids to tetraploids rather than the converse.

3. line 159-163. Perhaps a bit more information could be lifted from the supplementary file.

Reviewer #2: This manuscript presents a comprehensive study of genetic variation contributing to repeated adaptation in autopolyploid Arabidopsis lysate and A. Arenosa relative to their diploids. The authors employed genome-wide selection scanning methods to detect the genes and processes under selection across four tetraploid lineages, and next focused on 17 PSGs to characterize types of variants. Regarding haplotypes, the authors found most tetraploid haplotypes are novel from those in diploids. Regarding SNPs, the extent of different evolutionary sources were measured: trans-specific polymorphism from diploids (65.5%), new mutations (31.9%), tetraploid-specific inter-species hybridization (71.1%). These diverse SNP sources contribute to the novel, mosaic haplotypes bearing the repeated selection signatures of tetraploid adaptation.

This is a very well written manuscript and I appreciate the concise presentation of key findings. The authors also provide thorough supplemental materials to cover technical details and additional exploration such as the functional interpretation of candidate PSGs. I would like to offer only a few suggestion for improvement:

* Methodological Choice: While the authors used two genome-wide selection scanning methods, it would be beneficial to discuss the rationale for choosing these methods over other popular options like McDonald-Kreitman tests. Additionally, please elaborate on the suitability of these methods for diploid-autopolyploid species.

* Please include data information of PacBio HiFI reads used.

* L183: Clarify whether the term "haplotype" refers to haplotype blocks or physical haplotypes confirmed by HiFi reads.

* The higher proportion of trans-specific polymorphism sourced from A. arenosa compared to A. lyrata is intriguing. Besides the basal placement and larger genetic pool of A. arenosa, are there other possible explanations? For example, direction biased gene flow or selection for functional alleles from A. arenosa, etc.

* L448: Fig 5A should be Figure 4A

* L228-236: For the genomic background analysis, provide statistical test significance results from the permutation test to support the claim of lower trans-specific polymorphism, introgression, and new mutations.

* Only 11 PSG sequences were provided in TableS5.

Reviewer #3: This is a compact, well-written manuscript presenting some very interesting results. For me it was a good wake-up call to be more attentive to the Arabidopsis polyploidy system than I have been. Despite knowing some of the story from these authors concerning evolution involving meiosis-related genes in polyploids, it is clear that I am woefully behind in reading about the overall Arabidopsis evolutionary picture, particularly the issue of trans-specific variation involving wide hybridization. That, of course, plays a key role in the research reported here. The population genomic resources used in the study are quite impressive—the envy of those not working on near-model species—and are leveraged to excellent effect in this study.

The authors report that autopolyploids classified as A. lyrata or A. arenosa, though bearing different names, share haplotypes through post-polyploidy hybridization/introgression. That is a very interesting finding, one that I was more familiar with from other systems and from theoretical considerations, nicely demonstrated here.

Next, they identify 17 genes that are under positive selection in the polyploids (PSGs); these are either related to meiosis or to other cellular-level phenomena (notably cell cycle). This makes sense given previous work. What I found (find) difficult to understand fully was the emphasis, particularly in the Supplementary text, on endopolyploidy. I do not know the literature on the significance of endopolyploidy in natural Arabidopsis populations that is cited here. I do know something about endopolyploidy in synthetic A. thaliana polyploids (e.g., Corneille et al. 2017, Plant Physiology; Robinson et al. 2018, Plant Cell), but here the argument seems to be that the degree of endopolyploidy is associated with adaptation. This should be explained better (see comment below), but I seem to have gotten the main interesting points of the study without understanding this fully—there are 17 genes that show signs of positive selection in the polyploids in a way that is consistent with adaptation to an altered cellular landscape in which function is challenged by the presence of a doubled genome.

The authors then show that for the 12 genes with sufficient SNPs, tetraploids possess a major haplotype not found in diploids, including putative diploid source populations. Strikingly, these tetraploid-specific haplotypes have sequences that are a mosaic of different sources—not only the two diploid cytotypes of the same two species whose names the polyploids bear or from de novo mutations in the tetraploids, but in a significant percentage of cases coming from other Arabidopsis species. SNPs in each gene are traced to their sources in one of 7 scenarios. Fig. 4C is very helpful in illustrating what was done to produce the data summarized in 4B; 4A took a bit of time to decipher, but I don’t have any good suggestion of how to make it more clear (I think part of the problem may be my interpretation of “standing in diploids”, which can mean as few as a single species, and might be clearer with a longer description such as “standing in one or more diploid species” … but there isn’t much room in this already crowded, compact figure). Networks for each gene are presented in Supplementary Fig. 7, which I found very useful (see comment below).

One thing that seemed lacking here was an indication of where the SNPs under selection, identified in the “candidate SNP” approach that identified all of the PICMIN genes plus three additional candidates. Presumably at least some of these are among those shown for the two genes in Fig. 4C, and all of them can be placed in one of the 7 scenarios. I thought I might find that information in Supplemental Dataset 3, but (as noted below) I found that spreadsheet undecipherable.

In the Discussion, the authors present a narrative involving origins of the two polyploids and subsequent introgression, with selection on genes contributing to meiotic (or more broadly, cytological) stability. I have already commented on my inability to understand how this relates to endopolyploidy, which is a perfectly natural part of the normal functioning of some plant cell types. It seems to me that the endopolyploidy part is—or at least comes across as—a rather peripheral aspect of this paper, disconnected from the major points that are made.

In lines 311-312 the authors state, “Altogether, we demonstrated that extensive reshuffling of trans-specific, species-specific, and novel variation occurred in response to the challenges of WGD.” Most of this seems well-supported by the results, but “in response to” sounds rather teleological, as though the polyploid was looking for a way to adapt. Perhaps I am being too critical, but I wondered at one point whether genes other than the 12 PSGs studied here are also mosaics. If so, then the combining of various sources of variation is not “in response to” anything in particular—it simply is a product of the biology of these Arabidopsis lineages. The authors may be admitting the same thing by saying that this kind of mosaicism is not confined to polyploids, but also in diploids (lines 315-316) where clearly it is not “in response to the challenges of WGD”, at least not in the immediate past. Given the variation in the distribution of sources of SNPs across the 12 genes (Fig. 4B), and the similarity of the “genome” distribution in the same figure, I would be surprised if non-PSGs didn’t show the same range of variation in terms of sources of SNPs.

None of this diminishes the value of this work, which I think is of considerable interest, and as best as I can tell has been executed competently. I think it will be a useful contribution to the area of polyploid evolution.

Specific comments, by line number:

Line 59. “Autopolyploid” is here defined in the commonly used taxonomic sense, which I would argue is genetically misleading. There is an excellent very recent paper on this topic from Anneliese Mason (one of the stars in the polyploidy field) that I think the authors may want to cite here:

Lv, Z., Addo Nyarko, C., Ramtekey, V., Behn, H., and Mason, A.S. (2024). Defining autopolyploidy: Cytology, genetics, and taxonomy. American Journal of Botany 111, e16292.

117ff. I’m a bit puzzled by the lack of any evidence of reticulation in the TreeMix figures (2B, 2C); this isn’t a program I have used recently, but my recollection is that it is used to show gene flow (e.g., via introgression), which is clearly shown later in Fig. 2D. The legend has no explanation of how to interpret the topologies of these figures, or perhaps the lack of cycling in the graph.

148. Dataset 3. I looked at this spreadsheet, and could not make much sense of it. It would be very helpful if an explanation of some kind was included somewhere, preferable at the top of the file as in a regular text table.

159-160. “PSGs associated with the cell cycle trait endoreduplication on tetraploids”: This is awkwardly phrased. Maybe add commas or quotes?

173-174. “tetraploids aggregate as mosaics from different allelic sources (Fig. 1B)”:

250. “polyploidy-mediated hybrid seed rescue” is not a term with which I am familiar, at least by name. I took a quick look at Ref. 15 to see what is meant, and found that it relates to endosperm balance number. If there is a way to explain the phenomenon succinctly here instead of using what is a rather jargon-ish term, that could be helpful to readers like me.

267-268. “We found a substantial decrease in endoreduplication within established tetraploids compared to newly synthesized tetraploids.”: What is the adaptive significance of reduced endoreduplication? It is a normal part of development in many species, in which case it presumably is beneficial. My sense from the literature is that the effect of whole-plant doubling on endopolyploidy is variable.

299. “retainment”: Probably "retention" would be a better word here.

343-344. “Arabidopsis arenosa and A. lyrata autotetraploids show random segregation of

chromosomes [32, 38], resulting in allele frequency estimation comparable to diploids”: Random segregation in an autopolyploid should produce tetrasomic ratios, not disomic ratios as in diploids. Consequently I’m not sure what is meant here. See Lv et al. (2024) for discussion of the relationships among segregation ratios, bivalents/multivalents, and auto/allopolyploidy.

Supplementary text 1: “In summary, these results point to an adaptive compensation for DNA content per nucleus: while nearly double in neo-tetraploids, it returns toward diploid levels in established tetraploids (Fig. S4).” Genome downsizing references (e.g., Leitch and others) could be useful here; this is not a new concept in the polyploidy literature.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Jeff J. Doyle

Decision Letter 1

Justin C Fay, Yalong Guo

28 Nov 2024

Dear Dr Schmickl,

We are pleased to inform you that your manuscript entitled "Polyploids broadly generate novel haplotypes from trans-specific variation in Arabidopsis arenosa and Arabidopsis lyrata" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Yalong Guo, Ph.D.

Guest Editor

PLOS Genetics

Justin Fay

Section Editor

PLOS Genetics

Aimée Dudley

Editor-in-Chief

PLOS Genetics

Anne Goriely

Editor-in-Chief

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I have read the replies to the comments of the three reviewers as well as the revised manuscript and I feel that the authors have taken into account all of our comments and that the manuscript can now be published.

Reviewer #2: The authors have satisfactorily addressed my concerns.

Reviewer #3: The authors have addressed the concerns I raised in my review of the initial submission. The only substantive issue concerned endopolyploidy, and I feel that their response to my review was acceptable.

The study as described in the initial submission was well-conceived and executed, with convincing results and appropriate discussion. I believe it to be a very useful contribution to the literature on polyploid evolution.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Jeff J. Doyle

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-24-00791R1

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Justin C Fay, Yalong Guo

16 Dec 2024

PGENETICS-D-24-00791R1

Polyploids broadly generate novel haplotypes from trans-specific variation in Arabidopsis arenosa and Arabidopsis lyrata

Dear Dr Schmickl,

We are pleased to inform you that your manuscript entitled "Polyploids broadly generate novel haplotypes from trans-specific variation in Arabidopsis arenosa and Arabidopsis lyrata" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Dorothy Lannert

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Population structure of A. arenosa (A) and A. lyrata (A-D).

    A: PCA shows separation between A. arenosa and A. lyrata over 50,000 of scaffold_1 SNPs. (B-D): PCA, Nei’s distance-based neighbor-joining tree, and fastSTRUCTURE plot all support three lineages of tetraploid A. lyrata, named AL Germany, AL Czechia, and AL Austria. For these analyses we used 1,094,553 genome-wide four-fold degenerate SNPs. Red: diploid populations, blue: tetraploid populations.

    (EPS)

    pgen.1011521.s001.eps (184.6KB, eps)
    S2 Fig. Location of candidate genes (black vertical lines) on A. lyrata reference chromosomes colored by bins of distinct recombination rate per gene.

    The figure indicates that gene candidates do not cluster in regions with extreme values of recombination rate per gene (blue), as estimated based on the available A. lyrata recombination map [56].

    (EPS)

    pgen.1011521.s002.eps (58.1KB, eps)
    S3 Fig. Candidate positively selected genes (PSGs) involved in processes of cell cycle regulation, meiosis, and transcriptional regulation vary in their proportion of differentiated cis-regulatory, nonsynonymous, and synonymous SNPs.

    The number above each bar corresponds to the number of differentiated SNPs within each gene. ~5M of genic and cis-regulatory SNPs from whole genome sequencing data were taken as controls.

    (EPS)

    pgen.1011521.s003.eps (51.6KB, eps)
    S4 Fig. Phenotypic shift associated with the establishment of tetraploids of A. arenosa and A. lyrata in the form of a decrease in the level of DNA endoreduplication.

    Top: Proportion of endoreduplicated nuclei in leaves of 10 individuals per each lineage and ploidy. **: p < 0.01, ****: p < 0.0001, ns: nonsignificant, Wilcoxon rank sum test. The horizontal violet line shows the level of endoreduplication under a scenario of no post-WGD adaptation, estimated from values of synthetic neo-tetraploids of A. arenosa. Bottom: Prediction for the compensatory evolution of polyploid traits from [2] and observed average DNA content per leaf cell (calculated as a mean number of homologous chromosomes per nucleus) for diploid, neo-tetraploid, and tetraploid A. arenosa.

    (EPS)

    pgen.1011521.s004.eps (58.2KB, eps)
    S5 Fig. Comparison of leaf endoreduplication among diploids.

    Upper boxplots show the endoreduplication level, calculated as the number of endoreduplicated nuclei divided by all nuclei in the analysis, bottom plots show the maximum number of endoreduplication cycles in the leaf per each diploid lineage. ****: p < 0.0001, ***: p < 0.001, ns: nonsignificant, Wilcoxon rank sum test. Each boxplot is represented by 10 individuals.

    (EPS)

    pgen.1011521.s005.eps (204.9KB, eps)
    S6 Fig. Variable allelic sources of tetraploid haplotypes for each of the 12 tetraploid positively selected genes (PSGs).

    Barplots show the proportion of candidate SNPs representing each of the seven source scenarios (see Fig 4A for graphical visualization of scenarios). Upper plot shows the source when including singletons, bottom plot after filtering singletons out.

    (EPS)

    pgen.1011521.s006.eps (76.6KB, eps)
    S7 Fig. Nei’s distance-based neighbor-joining networks of tetraploid positively selected genes (PSGs) across diploid Arabidopsis (A. arenosa, A. lyrata, A. croatica, A. cebennensis, and A. pedemontana) and tetraploid Arabidopsis (A. arenosa, A. lyrata).

    Tetraploids from all four tetraploid lineages of A. arenosa and A. lyrata (blue) always form a single lineage, suggesting the presence of a single shared tetraploid haplotype at the locus. Further, they form an unresolved network with diploids of multiple species (red), suggesting diversity of allelic sources of each shared tetraploid haplotype (‘mosaic scenario’).

    (EPS)

    pgen.1011521.s007.eps (155.4KB, eps)
    S1 Table. Genome-wide nucleotide diversity and Tajima’s D of A. lyrata populations newly sequenced here, calculated over four-fold degenerate sites.

    (DOCX)

    pgen.1011521.s008.docx (6.5KB, docx)
    S2 Table. Summary of the 17 candidate tetraploid positively selected genes (PSGs).

    (DOCX)

    pgen.1011521.s009.docx (8.3KB, docx)
    S3 Table. Functional annotation of the 17 tetraploid positively selected genes (PSGs).

    (DOCX)

    pgen.1011521.s010.docx (9.9KB, docx)
    S4 Table. Presence and frequency of tetraploid, two diploid, and other haplotype blocks in all 61 tetraploid populations of A. arenosa and A. lyrata (479 individuals).

    AF: allele frequency of haplotype.

    (DOCX)

    pgen.1011521.s011.docx (6.5KB, docx)
    S5 Table. Variable allelic sources of tetraploid haplotypes for 1000-times randomly sampled 232 synonymous SNPs representing the genomic background versus the 232 candidate SNPs of the 12 positively selected genes (PSGs).

    The proportion of SNPs representing each of the seven source scenarios is given (see Fig 4A for graphical visualization of scenarios).

    (DOCX)

    pgen.1011521.s012.docx (6.3KB, docx)
    S1 Data. Metadata and sequence quality checks for the 983 whole genome sequenced individuals.

    (XLSX)

    pgen.1011521.s013.xlsx (163.2KB, xlsx)
    S2 Data. Sampling locations of the 129 populations.

    (XLSX)

    pgen.1011521.s014.xlsx (13.7KB, xlsx)
    S3 Data. Set of 54 significant PicMin windows and genome-wide PicMin results underlying the PicMin Manhattan plot.

    (XLSX)

    pgen.1011521.s015.xlsx (603.8KB, xlsx)
    S4 Data. List of outlier genes identified using the ‘candidate SNP’ approach and their overlap among tetraploid lineages.

    Candidate SNPs were used in the selection scan to identify positively selected genes (PSGs), following a three-step procedure. First, 1% outlier SNPs were identified; second, the top quartile of genes with the highest density of outlier SNPs were identified; and third, these genes were overlapped among the four tetraploid lineages of A. arenosa and A. lyrata to identify repeatedly differentiated genes. The PSGs and their overlap are shown here.

    (XLSX)

    pgen.1011521.s016.xlsx (9.1KB, xlsx)
    S5 Data. Sequences of the 12 positively selected genes (PSGs), assembled using long read sequencing of five diploid and five tetraploid individuals.

    (XLSX)

    pgen.1011521.s017.xlsx (76.8KB, xlsx)
    S6 Data. Table of linked candidate SNPs, as determined using long read sequencing.

    Position of variants within the same column (columns D-N) highlights their physical linkage within a gene region (black boxes). Note that these linked candidate SNPs were used as markers to reconstruct haplotypes from short read data, which is summarized in columns Q-AMJ.

    (XLSX)

    pgen.1011521.s018.xlsx (806.3KB, xlsx)
    S7 Data. Candidate SNPs marking haplotypes of CYCA2;3 and PDS5b (Figs 4C and 4D), as found on the same long read.

    (XLSX)

    pgen.1011521.s019.xlsx (36.7KB, xlsx)
    S8 Data. Distribution of the 232 tetraploid candidate SNPs among the 504 diploid samples used to estimate the likely sources of tetraploid haplotypes.

    (XLSX)

    pgen.1011521.s020.xlsx (20.6KB, xlsx)
    S9 Data. Metadata and sequence quality checks for the 10 long read-sequenced individuals using PacBio HiFi.

    (XLSX)

    pgen.1011521.s021.xlsx (6.7KB, xlsx)
    S1 Text. Detailed functional interpretations of the candidate positively selected genes (PSGs).

    (DOCX)

    pgen.1011521.s022.docx (8.2KB, docx)
    S2 Text. Detailed methods and results of estimating DNA endoreduplication using flow cytometry.

    (DOCX)

    S3 Text. Hypotheses about the spatio-temporal context of the origin of the ‘mosaic’ haplotypes.

    (DOCX)

    pgen.1011521.s024.docx (8.1KB, docx)
    Attachment

    Submitted filename: BohutinskaEtAl_PlosGenetics_response_041124.pdf

    pgen.1011521.s025.pdf (359.9KB, pdf)

    Data Availability Statement

    Sequence data that support the findings of this study are deposited in the NCBI (https://www.ncbi.nlm.nih.gov/bioproject/) under BioProjects PRJNA284572, PRJNA309929, PRJNA357693, PRJNA357372, PRJNA459481, PRJNA493227, PRJEB34247 (ENA), PRJNA506705, PRJNA484107, PRJNA592307, PRJNA667586, PRJNA929698. See S1 Data for individual codes. ScanTools_ProtEvol pipeline: github.com/mbohutinska/ScanTools_ProtEvol ABBA-BABA pipeline: github.com/simonhmartin/tutorials/tree/master/ABBA_BABA_whole_genome PicMin: github.com/TBooker/PicMin Allele frequencies of haplotype blocks: github.com/mbohutinska/repeatedWGD, section ‘Haplotype AF’.


    Articles from PLOS Genetics are provided here courtesy of PLOS

    RESOURCES