A population-level invasion by transposable elements triggers genome expansion in a fungal pathogen

Ursula Oggenfuss; Thomas Badet; Thomas Wicker; Fanny E Hartmann; Nikhil Kumar Singh; Leen Abraham; Petteri Karisto; Tiziana Vonlanthen; Christopher Mundt; Bruce A McDonald; Daniel Croll

doi:10.7554/eLife.69249

. 2021 Sep 16;10:e69249. doi: 10.7554/eLife.69249

A population-level invasion by transposable elements triggers genome expansion in a fungal pathogen

Ursula Oggenfuss ¹, Thomas Badet ¹, Thomas Wicker ², Fanny E Hartmann ^3,⁴, Nikhil Kumar Singh ¹, Leen Abraham ¹, Petteri Karisto ^4,^†, Tiziana Vonlanthen ⁴, Christopher Mundt ⁵, Bruce A McDonald ⁴, Daniel Croll ^1,^✉

Editors: Detlef Weigel⁶, Detlef Weigel⁷

PMCID: PMC8445621 PMID: 34528512

Abstract

Genome evolution is driven by the activity of transposable elements (TEs). The spread of TEs can have deleterious effects including the destabilization of genome integrity and expansions. However, the precise triggers of genome expansions remain poorly understood because genome size evolution is typically investigated only among deeply divergent lineages. Here, we use a large population genomics dataset of 284 individuals from populations across the globe of Zymoseptoria tritici, a major fungal wheat pathogen. We built a robust map of genome-wide TE insertions and deletions to track a total of 2456 polymorphic loci within the species. We show that purifying selection substantially depressed TE frequencies in most populations, but some rare TEs have recently risen in frequency and likely confer benefits. We found that specific TE families have undergone a substantial genome-wide expansion from the pathogen’s center of origin to more recently founded populations. The most dramatic increase in TE insertions occurred between a pair of North American populations collected in the same field at an interval of 25 years. We find that both genome-wide counts of TE insertions and genome size have increased with colonization bottlenecks. Hence, the demographic history likely played a major role in shaping genome evolution within the species. We show that both the activation of specific TEs and relaxed purifying selection underpin this incipient expansion of the genome. Our study establishes a model to recapitulate TE-driven genome evolution over deeper evolutionary timescales.

Research organism: Other

Introduction

Transposable elements (TEs) are mobile repetitive DNA sequences with the ability to independently insert into new regions of the genome. TEs are major drivers of genome instability and epigenetic change (Eichler and Sankoff, 2003). Insertion of TEs can disrupt coding sequences, trigger chromosomal rearrangements, or alter expression profiles of adjacent genes (Lim, 1988; Petrov et al., 2003; Slotkin and Martienssen, 2007; Hollister and Gaut, 2009; Oliver et al., 2013). Hence, TE activity can have phenotypic consequences and impact host fitness. While TE insertion dynamics are driven by the selfish interest for proliferation, the impact on the host can range from beneficial to highly deleterious. The most dramatic examples of TE insertions underpinned rapid adaptation of populations or species (Feschotte, 2008; Chuong et al., 2017), particularly following environmental change or colonization events. Beneficial TE insertions are expected to experience strong positive selection and rapid fixation in populations. However, most TE insertions have neutral or deleterious effects upon insertions. Purifying selection is expected to rapidly eliminate deleterious insertions from populations unless constrained by genetic drift (Walser et al., 2006; Baucom et al., 2009; Cridland et al., 2013; Stuart et al., 2016; Lai et al., 2017; Stritt et al., 2018). Additionally, genomic defense mechanisms can disable transposition activity. Across eukaryotes, epigenetic silencing is a shared defense mechanism against TEs (Slotkin and Martienssen, 2007). Fungi evolved an additional and highly specific defense system introducing repeat-induced point (RIP) mutations into any nearly identical set of sequences. The relative importance of demography, selection, and genomic defenses determining the fate of TEs in populations remain poorly understood.

A crucial property predicting the invasion success of TEs in a genome is the transposition rate. TEs tend to expand through family-specific bursts of transposition followed by prolonged phases of transposition inactivity. Bursts of insertions of different retrotransposon families were observed across eukaryotic lineages including Homo sapiens, Zea mays, Oryza sativa, and Blumeria graminis (Shen et al., 1991; SanMiguel et al., 1998; Eichler and Sankoff, 2003; Piegu et al., 2006; Lu et al., 2017; Frantzeskakis et al., 2018). Prolonged bursts without effective counterselection are thought to underpin genome expansions. In the symbiotic fungus Cenococcum geophilum, the burst of TEs resulted in a dramatically expanded genome compared to closely related species (Peter et al., 2016). Similarly, a burst of a TE family in brown hydras led to an approximately threefold increase of the genome size compared to related hydras (Wong et al., 2019). Across the tree of life, genome sizes vary by orders of magnitude and enlarged genomes invariably show hallmarks of historic TE invasions (Kidwell, 2002). Population size variation is among the few correlates of genome size across major groups, suggesting that the efficacy of selection plays an important role in controlling TE activity (Lynch, 2007). Reduced selection efficacy against deleterious TE insertions is expected to lead to a ratchet-like increase in genome size. In fungi, TE-rich genomes often show an isochore structure alternating gene-rich and TE-rich compartments (Rouxel et al., 2011). TE-rich compartments often harbor rapidly evolving genes such as effector genes in pathogens or resistance genes in plants (Raffaele and Kamoun, 2012; Jiao and Schneeberger, 2019). Taken together, incipient genome expansions are likely driven by population-level TE insertion dynamics.

The fungal wheat pathogen, Zymoseptoria tritici, is one of the most important pathogens on crops, causing high yield losses in many years (Torriani et al., 2015). Z. tritici emerged during the domestication of wheat in the Fertile Crescent where the species retained high levels of genetic variation (Zhan et al., 2005; Stukenbrock et al., 2011). The pathogen migrated to all temperate zones where wheat is currently grown and underwent multiple migration bottlenecks, in particular when colonizing Oceania and North America (Zhan et al., 2005; Estep et al., 2015). The genome is completely assembled and shows size variation between individuals sampled across the global distribution range (Feurtey et al., 2020; Badet et al., 2020; Goodwin et al., 2011). The TE content of the genome shows a striking variation of 17–24% variation among individuals (Badet et al., 2020). Z. tritici recently gained major TE-mediated adaptations to colonize host plants and tolerate environmental stress (Omrane et al., 2015; Omrane et al., 2017; Krishnan et al., 2018; Meile et al., 2018). Clusters of TEs are often associated with genes encoding important pathogenicity functions (i.e. effectors), recent gene gains or losses (Hartmann and Croll, 2017), and major chromosomal rearrangements (Croll et al., 2013; Plissonneau et al., 2016). Transposition activity of TEs also had a genome-wide impact on gene expression profiles during infection (Fouché et al., 2019). The well-characterized demographic history of the pathogen and evidence for recent TE-mediated adaptations make Z. tritici an ideal model to recapitulate the process of TE insertion dynamics, adaptive evolution, and changes in genome size at the population level.

Here, we retrace the population-level context of TE insertion dynamics and genome size changes across the species range by analyzing populations sampled on four continents for a total of 284 genomes. We developed a robust pipeline to detect newly inserted TEs using short read sequencing datasets. Combining analyses of selection and knowledge of the colonization history of the pathogen, we tested whether population bottlenecks were associated with substantial changes in the TE content and the size of genomes.

Results

A dynamic TE landscape shaped by strong purifying selection

We detected 4753 TE copies, grouped into 30 families with highly variable copy numbers in the reference genome IPO323 (Figure 2—source data 1 and Figure 2—figure supplement 1A). To establish a comprehensive picture of within-species TE dynamics, we analyzed 295 genomes from a worldwide set of six populations spanning the distribution range of the wheat pathogen Z. tritici. To ascertain the presence or absence of TEs across the genome, we developed a robust pipeline (Figure 1A). In summary, we called TE insertions by identifying reads mapping to both a TE sequence and a specific location in the reference genome. Then, we assessed the minimum sequencing coverage to reliably recover TE insertions and removed 11 genomes with an average read depth below 15× (Figure 1B). We tested for evidence of TEs using read depth at target site duplications (Figure 1C) and scanned the genome for mapped reads indicating gaps at TE loci (Figure 1D). We found robust evidence for a total of 18,864 TE insertions grouping into 2465 individual loci. Of these loci, 35.5% (n = 876) have singleton TEs (i.e., this locus is only present in one isolate: Figure 2A, Figure 2—source data 3). An overwhelming proportion of loci (2345 loci or 95.1%) have a TE frequency below 1%. Singleton TE insertions in particular can be the product of spurious Illumina read mapping errors (Nakamura et al., 2011). To assess the reliability of the detected singletons, we focused on seven isolates for which PacBio long-read data was available (Badet et al., 2020). Aligned PacBio reads confirmed the exact location of 71% (22 of 31 singleton insertions among seven isolates; see Materials and methods for further details). We found no significant difference in read coverage between confirmed and unconfirmed singleton insertions (Figure 2—figure supplement 1B,C and Figure 2—source data 2).

Figure 1. — (B) Read depth downsampling analysis for one isolate per population with an average coverage of the population. The vertical black line indicates the coverage at which on average 90% of the maximally detectable variants were recovered. Dashed black lines indicate the standard error. The threshold for a minimal mean coverage was set at 15× (red line). (C) Validation of insertions absent in the reference genome. (i) TE insertions that are not present in the reference genome show a duplication of the target site and the part of the reads that covers the TE will not be mapped against the reference genome. We thus expect reads to map to the TE surrounding region and the target site duplication but not the TE itself. At the target site, a local duplication of read depth is expected. (ii) We selected all reads in an interval of 100 bp up- and downstream including the target site duplication to detect deviations in the number of reads terminating near the target site duplication. (D) Validation of insertions present in the reference genome. (i) Analyses read coverage at target site duplications. (ii) Decision map if a TE should be kept as a true insertion or rejected as a false positive. Only predicted TE insertions that overlap evidence of split reads were kept as TE insertions in downstream analyses. (E) Singleton validation using long-read PacBio sequencing. (i) Analysis if TE insertions overlap with a detected insertion/deletion locus (Badet et al., 2021). (ii) Homology search of the TE insertion flanking sequences based on the reference genome against PacBio reads. In addition, the consensus sequence of the inserted TE was used for matches between the flanks.

Figure 1—source data 1. TE insertion validations for non-reference copies.

elife-69249-fig1-data1.txt^{(2MB, txt)}

Figure 1—source data 2. TE consensus sequences.

elife-69249-fig1-data2.txt^{(205.6KB, txt)}

Figure 1—figure supplement 1. — (B) Read depth downsampling analysis for one isolate per population with an average coverage of the population. The vertical black line indicates the coverage at which on average 90% of the maximally detectable variants were recovered. Dashed black lines indicate the standard error. The threshold for a minimal mean coverage was set at 15× (red line). (C) Validation of insertions absent in the reference genome. (i) TE insertions that are not present in the reference genome show a duplication of the target site and the part of the reads that covers the TE will not be mapped against the reference genome. We thus expect reads to map to the TE surrounding region and the target site duplication but not the TE itself. At the target site, a local duplication of read depth is expected. (ii) We selected all reads in an interval of 100 bp up- and downstream including the target site duplication to detect deviations in the number of reads terminating near the target site duplication. (D) Validation of insertions present in the reference genome. (i) Analyses read coverage at target site duplications. (ii) Decision map if a TE should be kept as a true insertion or rejected as a false positive. Only predicted TE insertions that overlap evidence of split reads were kept as TE insertions in downstream analyses. (E) Singleton validation using long-read PacBio sequencing. (i) Analysis if TE insertions overlap with a detected insertion/deletion locus (Badet et al., 2021). (ii) Homology search of the TE insertion flanking sequences based on the reference genome against PacBio reads. In addition, the consensus sequence of the inserted TE was used for matches between the flanks.

Figure 1—source data 1. TE insertion validations for non-reference copies.

elife-69249-fig1-data1.txt^{(2MB, txt)}

Figure 1—source data 2. TE consensus sequences.

elife-69249-fig1-data2.txt^{(205.6KB, txt)}

Figure 2. — (A) Allele frequencies of the TE insertions across all isolates. (B) TE insertions per Mb on core chromosomes (dark) and accessory chromosomes (light). Dashed lines represent mean values. Blue: global mean of 75.65 insertions/Mb, dark: core chromosome mean of 58 TEs/Mb, light: accessory chromosome mean of 102.24 insertions/Mb. (C) Number of TE insertions per family. (D) TE frequencies among isolates and copy numbers across the genome. The blue line indicates the maximum number of isolates (n = 284). (E) Allele frequency distribution of TE insertions into introns and exons. (F) Number of TE insertions within 1 kb up- and downstream of genes on core chromosomes including introns and exons (100 bp windows). The blue arrow indicates a gene schematic with exons and an intron, the green triangles indicate TE insertions. The dotted blue line indicates no deviation from the expected value (i.e., mean number of TEs per window).

Figure 2—source data 1. TEs in reference.

elife-69249-fig2-data1.txt^{(1.5KB, txt)}

Figure 2—source data 2. Presence absence matrix TE loci.

elife-69249-fig2-data2.txt^{(87.9MB, txt)}

Figure 2—source data 3. Singletons.

elife-69249-fig2-data3.txt^{(2.9KB, txt)}

Figure 2—figure supplement 1. — (A) Allele frequencies of the TE insertions across all isolates. (B) TE insertions per Mb on core chromosomes (dark) and accessory chromosomes (light). Dashed lines represent mean values. Blue: global mean of 75.65 insertions/Mb, dark: core chromosome mean of 58 TEs/Mb, light: accessory chromosome mean of 102.24 insertions/Mb. (C) Number of TE insertions per family. (D) TE frequencies among isolates and copy numbers across the genome. The blue line indicates the maximum number of isolates (n = 284). (E) Allele frequency distribution of TE insertions into introns and exons. (F) Number of TE insertions within 1 kb up- and downstream of genes on core chromosomes including introns and exons (100 bp windows). The blue arrow indicates a gene schematic with exons and an intron, the green triangles indicate TE insertions. The dotted blue line indicates no deviation from the expected value (i.e., mean number of TEs per window).

Figure 2—source data 1. TEs in reference.

elife-69249-fig2-data1.txt^{(1.5KB, txt)}

Figure 2—source data 2. Presence absence matrix TE loci.

elife-69249-fig2-data2.txt^{(87.9MB, txt)}

Figure 2—source data 3. Singletons.

elife-69249-fig2-data3.txt^{(2.9KB, txt)}

The abundance of singleton TE insertions strongly supports the idea that TEs actively copy into new locations but also indicates that strong purifying selection maintains nearly all TEs at low frequency (Figure 2A). The density of TE loci on accessory chromosomes, which are not shared among all isolates of the species, is almost twice the density found on core chromosomes (102 vs 58 TEs per Mb; Figure 2B, Figure 2—figure supplement 2A). This suggests relaxed selection against TE insertion on the functionally dispensable and gene-poor accessory chromosomes. We found no difference in TE allele frequency distribution between recombination hotspots and the rest of the genome (Figure 2—figure supplement 2B). Similarly, the TE density and the number of insertions did not vary between recombination hotspots and the genomic background (Figure 2—figure supplement 2C).

TEs grouped into 23 families and 11 superfamilies, with 88.2% of all copies belonging to class I/retrotransposons (n = 2175; Figure 2C, Figure 2—figure supplement 3A,B). RLG/Gypsy (n = 1483) and RLC/Copia (n = 623) elements constitute the largest long terminal repeats (LTR) superfamilies. Class II/DNA transposons are dominated by DHH/Helitron (n = 249). As expected, TE families shared among fewer isolates tend to show also lower global copy numbers (i.e., all isolates combined), while TE families that are present in all isolates generally have high global copy numbers (Figure 2D).

We detected 153 loci with TEs inserted into genes with most of the insertions being singletons (44.7%; n = 68) or of very low frequency (Figure 2E). Overall, TE insertions into exonic sequences were less frequent than expected compared to insertions into up- and downstream regions, which is consistent with effective purifying selection (Figure 2F). Insertions into introns were also strongly under-represented, likely due to the small size of most fungal introns (~50–100 bp) and the high probability of disrupting splicing or adjacent coding sequences. We also found that insertions 800–1000 bp away from coding sequences of a focal gene were under-represented. Given the high gene density, with an average spacing between genes of 1.744 kb, TE insertions within 800–1000 bp of a coding gene tend to be near adjacent genes already. Taken together, TEs in the species show a high degree of transposition activity and are subject to strong purifying selection.

Detection of candidate TE loci underlying recent adaptation

The TE transposition activity can generate adaptive genetic variation. To identify the most likely candidate loci, we analyzed insertion frequency variation among populations as an indicator for recent selection. Across all populations, the insertion frequencies differed only weakly with a strong skew toward extremely low F_ST values (mean = 0.0163; Figure 3A,B, Figure 3—figure supplement 1). To further analyze evidence for TE-mediated adaptive evolution, we screened a genome-wide SNP dataset for evidence of selective sweeps using selection scans. We found 16.5% of all TE loci located in regions of selective sweep. Given our population sampling of two population pairs, we tested for adaptive TE insertions in selective sweep regions either in the North American or European population pairs. Hence, we selected loci having low TE insertion frequencies (<5%) in all populations except either the recent North American or European population (>20%) (Figure 3B). Based on these criteria, we obtained seven candidate loci possibly underlying local adaptation (six in North America, one in Europe; Figure 4A, Figure 4—source data 1). All loci carry inserted retrotransposons with four RLG_Luna, one RLG_Mercurius, and one RLG_Deimos.

Figure 3. — (A) Global pairwise F_ST distributions shown across the 21 chromosomes. The red horizontal line indicates the mean F_ST (=0.0163). TEs with a strong local short-term frequency difference among populations are highlighted (blue: increase in Europe; green: increase in North America). (B) Allele frequency changes between the populations. The same TE loci as in (A) are highlighted. (C) Circos plot describing from the outside to the inside: The black line indicates chromosomal position in Mb. Blue bars indicate the gene density in windows of 100 kb with darker blue representing higher gene density. Red bars indicate the TE density in windows of 100 kb with a darker red representing higher TE density. Green triangles indicate positions of TE insertions with among population F_ST value shown on the y-axis.

Figure 3—figure supplement 1. — (A) Global pairwise F_ST distributions shown across the 21 chromosomes. The red horizontal line indicates the mean F_ST (=0.0163). TEs with a strong local short-term frequency difference among populations are highlighted (blue: increase in Europe; green: increase in North America). (B) Allele frequency changes between the populations. The same TE loci as in (A) are highlighted. (C) Circos plot describing from the outside to the inside: The black line indicates chromosomal position in Mb. Blue bars indicate the gene density in windows of 100 kb with darker blue representing higher gene density. Red bars indicate the TE density in windows of 100 kb with a darker red representing higher TE density. Green triangles indicate positions of TE insertions with among population F_ST value shown on the y-axis.

Figure 4. — (A) Distribution of all extremely differentiated TEs and their distance to the closest gene. Color indicates the superfamily. The stars indicate TE insertions not found in the reference genome. (B) Location of the RLG_Luna TE insertion on chromosome 12 corresponding to its two closest genes. (C) Resistance against azole fungicides among isolates as a function of TE presence or absence. (D) Genomic niche of the RLG_Luna TE insertion on chromosome 12: *F_ST* values for each TE insertion, gene content (blue), TE content (green) and GC content (yellow). The gray section highlights the insertion site. (E) Number of RLG_Luna copies per isolate and population. (F) Frequency changes of RLG_Luna between the two North American populations compared to the other populations. Colors indicate the number of copies per chromosome. (G) Phylogenetic trees of the coding sequences of either the gene encoding the RTA1-like protein or the protein kinase domain. Isolates of the two North American populations and an additional 11 isolates from other populations not carrying the insertion are shown. Blue color indicates TE presence, yellow indicates TE absence.

Figure 4—source data 1. Top loci information.

elife-69249-fig4-data1.txt^{(3.1KB, txt)}

Figure 4—figure supplement 1. — (A) Distribution of all extremely differentiated TEs and their distance to the closest gene. Color indicates the superfamily. The stars indicate TE insertions not found in the reference genome. (B) Location of the RLG_Luna TE insertion on chromosome 12 corresponding to its two closest genes. (C) Resistance against azole fungicides among isolates as a function of TE presence or absence. (D) Genomic niche of the RLG_Luna TE insertion on chromosome 12: *F_ST* values for each TE insertion, gene content (blue), TE content (green) and GC content (yellow). The gray section highlights the insertion site. (E) Number of RLG_Luna copies per isolate and population. (F) Frequency changes of RLG_Luna between the two North American populations compared to the other populations. Colors indicate the number of copies per chromosome. (G) Phylogenetic trees of the coding sequences of either the gene encoding the RTA1-like protein or the protein kinase domain. Isolates of the two North American populations and an additional 11 isolates from other populations not carrying the insertion are shown. Blue color indicates TE presence, yellow indicates TE absence.

Figure 4—source data 1. Top loci information.

elife-69249-fig4-data1.txt^{(3.1KB, txt)}

One TE insertion is 3815 bp downstream of a gene encoding an RTA1-like protein, which can function as transporters with a transmembrane domain and have been associated with resistance against several antifungal compounds (Soustre et al., 1996). The insertion is also 5785 bp upstream of a gene encoding a protein kinase domain (Figure 4B). The TE insertion was not detected in the Middle East or the two European populations and was at low frequencies in the Australian (3.7%) and North American 1990 (1.7%) populations, but increased to 53% of all isolates in the North American 2015 population (fixation index F_ST = 0.42; Figure 4—source data 1). Isolates that carry the insertion show a significantly higher resistance to azole antifungal compounds (Figure 4C). The TE is in the subtelomeric region of chromosome 12, with a moderate GC content, a low TE, and a high gene density (Figure 4D). The TE belongs to the family RLG_Luna, which shows a substantial burst across different chromosomes within the species (Figure 4E,F). We found no association between the phylogenetic relationships among isolates based on the two closest genes and the presence or absence of the TE insertion (Figure 4G). A second candidate adaptive TE insertion belongs to the RLG_Mercurius family and is located between two genes of unknown function (Figure 4—figure supplement 1). A third potentially adaptive TE insertion of a RLC_Deimos is 229 bp upstream of a gene encoding a SNARE domain protein and 286 bp upstream of a gene encoding a flavin amine oxidoreductase. Furthermore, the TE is inserted in a selective sweep region (Figure 4—figure supplement 1). SNARE domains play a role in vesicular transport and membrane fusion (Bonifacino and Glick, 2004). An additional four candidates for adaptive TE insertions belong to RLG_Luna and were located distantly to genes (Figure 4—figure supplement 1). We experimentally tested whether the TE insertions in proximity to genes were associated with higher levels of fungicide resistance. For this, we measured growth rates of the fungal isolates in the presence or absence of an azole fungicide widely deployed against the pathogen. We found that the insertion of TEs at two loci was positively associated with higher levels of fungicide resistance, suggesting that the adaptation was mediated by the TE (Figure 4C, Figure 4—figure supplement 1).

Population-level expansions in TE content

If TE insertion dynamics are largely neutral across populations, TE frequencies across loci should reflect neutral population structure. To test this, we performed a principal component analysis based on a set of six populations on four continents that represent the global genetic diversity of the pathogen (Figure 5A) and 900,193 genome-wide SNPs (Figure 5B). The population structure reflected the demographic history of the pathogen with clear continental differentiation and only minor within-site differentiation. To account for the lower number of TE loci, we performed an additional principal component analysis using a random SNP set of similar size to the number of TE loci. The reduced SNP set retained the geographic signal of the broader set of SNPs (Figure 5C). In stark contrast, TE frequencies across loci showed only weak clustering by geographic origin with the Australian population being the most distinct (Figure 5D). We found a surprisingly strong differentiation of the two North American populations sampled at a 25-year interval in the same field in Oregon.

Figure 5. — (A) Sampling locations of the six populations. Middle East represents the region of origin of the pathogen. In North America, the two populations were collected at an interval of 25 years in the same field in Oregon. In Europe, two populations were collected at an interval of 17 years from two fields in Switzerland <20 km apart. Dark arrows indicate the historic colonization routes of the pathogen. (B) Principal component analysis (PCA) of 284 *Zymoseptoria tritici* isolates, based on 900,193 genome-wide SNPs. (C) PCA of a reduced SNP data set with randomly selected 203 SNPs matching approximately the number of analyzed TE loci. (D) PCA based on 193 TE insertion loci. Loci with allele frequency <5% are excluded.

Figure 5—source data 1. Isolates.

elife-69249-fig5-data1.txt^{(18.3KB, txt)}

Unusual patterns in population differentiation at TE loci suggests that TE activity may substantially vary across populations (Figure 6, Figure 4—source data 1). To analyze this, we first identified the total TE content across all loci per isolate. We found generally lower TE numbers in the Middle Eastern population from Israel (Figure 6A–C, Figure 6—figure supplement 1), which is close to the pathogen’s center of origin (Stukenbrock et al., 2007). Populations that underwent at least one migration bottleneck showed a substantial burst of TEs across all major superfamilies. These populations included the two populations from Europe collected in 1999 and 2016 and the North American population from 1990, as well as the Australian population. We found a second stark increase in TE content in the North American population sampled in 2015 at the same site as the population from 1990. Strikingly, the isolate with the lowest number of analyzed TEs collected in 2015 was comparable to the isolate with the highest number of TEs at the same site in 1990. We tested whether sequencing coverage could explain variation in the detected TEs across isolates, but we found no meaningful association (Figure 2—figure supplement 3C). We analyzed whether the population-specific expansions were correlated with shifts in the frequency spectrum of TEs in the populations (Figure 6D). We found that the first step of expansions observed in Europe compared to the Middle East (Israel) was associated with an upwards shift in allele frequencies. This is consistent with transposition activity creating new copies in the genomes and stronger purifying selection in the Middle East. Similarly, the North American populations showed also signatures consistent with relaxation of selection against TEs (i.e., fewer low-frequency TEs). We found a significant difference (two-sample Kolmogorov–Smirnov test, two-sided) in the curve shapes between the population from the Middle East and North America 2015 (Figure 6—source data 1). We analyzed variation in TE copy numbers across families and found that the expansions were mostly driven by RLG elements including the families Luna, Sol, and Venus, the RLC family Deimos, and the LINE family Lucy (Figure 6E, Figure 6—figure supplement 2). We also found a North American–specific burst in DHH elements of the family Ada (increase from 4.6 to 6.1 copies on average per isolate), an increase specific to Swiss populations in LINE elements, and an increase in RLC elements in the Australian and the two North American populations. Analyses of complete Z. tritici reference-quality genomes that include isolates from the Israel, Australia, Switzerland (1999), and North American (1990) population revealed high TE contents in Australia and North America (Oregon 1990) (Badet et al., 2020). The reference-quality genomes confirmed also that the increase in TEs was driven by LINE, RLG, and RLC families in Australia and DHH, RLG, and RLC families in North America (Badet et al., 2020).

Figure 6—figure supplement 1. — (A) Total TE copies per isolate. Colors identify TE superfamilies. (B) TE copies per family and (C) superfamily. (D) TE insertion frequency spectrum per population. The curve fitting was performed with a self-starting Nls asymptomatic regression model (E). TE family copy numbers per isolate.

Figure 6—source data 1. Kolmogorof–Smirnov.

elife-69249-fig6-data1.txt^{(838B, txt)}

TE-mediated genome size expansions

The combined effects of actively copying TE families and relaxed purifying selection lead to an accumulation of new TE insertions in populations. Consequently, mean genome sizes in populations should increase over generations. We estimated the cumulative length of TE insertions based on the length of the corresponding TE consensus sequences and found a strong increase in the total TE length in populations outside the Middle East center of origin and a second increase between the two North American populations (Figure 7—figure supplement 1A). To test for incipient genome expansions within the species, we first assembled genomes of all 284 isolates included in the study. Given the limitations of short-read assemblies, we implemented corrective measures to compensate for potential variation in assembly qualities. We corrected for variation in the GC content of different sequencing datasets by downsampling reads to generate balanced sequencing read sets prior to assembly (see Materials and methods). We also excluded all reads mapping to accessory chromosomes because different isolates are known to differ in the number of these chromosomes. Genome assemblies were checked for completeness by retrieving the phylogenetically conserved BUSCO genes (Figure 7A). Genome assemblies across different populations carry generally >99% complete BUSCO gene sets, matching the completeness of reference-quality genomes of the same species (Badet et al., 2020). The completeness of the assemblies showed no correlation with either TE or GC content of the genomes. GC content was inversely correlated with genome size consistent with the expansion of repetitive regions having generally low GC content (Figure 7B). We found that the core genome size varied substantially among populations with the Middle East, Australia, as well as the two older European and North American populations having the smallest core genome sizes (Figure 7C). We found a notable increase in core genome size in both the more recent European and North American populations. The increase in core genome size is positively correlated with the count and cumulative length of all inserted TEs (Figure 7D, E and G) and negatively correlated with the genome-wide GC content (Figure 7F,G, Benjamini and Speed, 2012). Hence, core genome size shows substantial variation within the species matching the recent expansion in TEs across continents. We found the most variable genome sizes in the more recent North American population (Figure 7—figure supplement 1B). Finally, we contrasted variation in genome size with the detected TE insertion dynamics. For this, we assessed the variable genome segment as the difference between the smallest and largest analyzed core genome. To reflect TE dynamics, we calculated the cumulative length of all detected TE insertions in any given genome. We found that the cumulative length of inserted TEs represents between 4.8% and 184% of the variable genome segment defined for the species or 0.2–2.6% of the estimated genome size per isolate (Figure 7—figure supplement 1C,D).

Figure 7. — (A) BUSCO completeness variation among genome assemblies. Black lines indicate the mean genome size per population. (B) Genome-wide GC content variation. (C) Core genome size variation among the isolates of the populations (excluding accessory chromosomes). (D) Correlation of core genome size and number of detected TEs. (E) Correlation of core genome size and the cumulative length of all TEs detected as inserted. (F) Correlation of core genome size and genome-wide GC content. (G) Spearman correlation matrix of BUSCO completeness, core genome size, number of detected TEs, and genome-wide GC content.

Figure 7—figure supplement 1. — (A) BUSCO completeness variation among genome assemblies. Black lines indicate the mean genome size per population. (B) Genome-wide GC content variation. (C) Core genome size variation among the isolates of the populations (excluding accessory chromosomes). (D) Correlation of core genome size and number of detected TEs. (E) Correlation of core genome size and the cumulative length of all TEs detected as inserted. (F) Correlation of core genome size and genome-wide GC content. (G) Spearman correlation matrix of BUSCO completeness, core genome size, number of detected TEs, and genome-wide GC content.

Discussion

TEs play a crucial role in generating adaptive genetic variation within species but are also drivers of deleterious genome expansions. We analyzed the interplay of TEs with selective and neutral processes including population differentiation and incipient genome expansions. TEs have substantial transposition activity in the genome but are strongly counterselected and are maintained at low frequency. TE dynamics showed distinct trajectories across populations with more recently established populations having higher TE content and a concurrent expansion of the genome.

Recent selection acting on TE insertions

TE frequencies in the species show a strong skew toward singleton insertions across populations. However, our short read based analyses are possibly skewed toward over-counting singletons as indicated by independent long-read mapping evaluations. Nevertheless, the skew toward low-frequency TE insertions indicates both that TEs are undergoing transposition and that purifying selection maintains frequencies at a low level. Similar effects of selection on active TEs were observed across plants and animals, including Drosophila melanogaster and Brachypodium distachyon (Cridland et al., 2013; Stritt et al., 2018; Luo et al., 2020). TE insertions were under-represented in or near coding regions, showing a stronger purifying selection against TEs inserting into genes. Coding sequences in the Z. tritici genome are densely packed with an average distance of only ~1 kb (Goodwin et al., 2011). Consistent with this high gene density, TE insertions were most frequent at a distance of 200–400 bp away from coding sequences. A rapid decay in linkage disequilibrium in the Z. tritici populations (Croll et al., 2015; Hartmann et al., 2018) likely contributed to the efficiency of removing deleterious insertions. Some TE superfamilies have preferred insertion sites in coding regions and transcription start sites (Miyao et al., 2003; Fu et al., 2013; Gilly et al., 2014; Quadrana et al., 2016). Hence, some heterogeneity in the observed insertion site distribution across the genome is likely due to insertion preferences of individual TEs. We also found evidence for positive selection acting on TEs with the strongest candidate locus being a TE insertion on chromosome 12. This locus showed a frequency increase only in the more recent North American population, which experienced the first systematic fungicide applications and subsequent emergence of fungicide resistance in the decade prior to the last sampling (Estep et al., 2015). The nearest gene encodes a RTA1-like protein, a transmembrane exporter that is associated with resistance toward different stressors, including antifungal compounds, and shows strong copy number variation in several fungi (Soustre et al., 1996; Rogers and Barker, 2003; Sirisattha et al., 2004; Ali et al., 2013; Yew et al., 2016; Liang et al., 2018). Hence, the TE insertion may have positively modulated RTA1 expression to resist antifungals.

Transposition activity in a genome and counteracting purifying selection are expected to establish an equilibrium over evolutionary time (Charlesworth and Charlesworth, 2009). However, temporal bursts of TE families and changes in population size due to bottlenecks or founder events are likely to shift the equilibrium. Despite purifying selection, we were able to detect signatures of positive selection by scanning for short-term population frequency shifts. Population genomic datasets can be used to identify the most likely candidate loci underlying recent adaptation. The shallow genome-wide differentiation of Z. tritici populations provides a powerful background to test for outlier loci (Hartmann et al., 2018). We found the same TE families to have experienced genome-wide copy number expansions, suggesting that the availability of adaptive TE insertions may be a by-product of TE bursts in individual populations.

Population-level TE invasions and relaxed selection

Across the surveyed populations from four continents, we identified substantial variation in TE counts per genome. The increase in TEs matches the global colonization history of the pathogen with an increase in TE copies in more recently established populations (Zhan et al., 2003; Stukenbrock et al., 2007). Compared to the Israeli population located nearest the center of origin in the Middle East, the European populations showed a threefold increase in TE counts. The Australian and North American populations established from European descendants retained high TE counts. We identified a second increase at the North American site where TE counts nearly doubled again over a 25 year period. Compared to the broader increase in TEs from the Middle East, the second expansion at the North American site was driven by a small subset of TE families alone. Analyses of completely assembled reference-quality genomes from the same populations confirmed that genome expansions were primarily driven by the same TE families belonging to the RLG, RLC, and DHH superfamilies (Badet et al., 2020). Consistent with the contributions from individual TEs, we found that the first expansion in Europe led to an increase in low-frequency variants, suggesting higher transposition activity of many TEs in conjunction with strong purifying selection. The second expansion at the North American site shifted TE frequencies upwards, suggesting relaxed selection against TEs. The population-level context of TEs in Z. tritici shows how heterogeneity in TE control interacts with demography to determine extant levels of TE content and, ultimately, genome size.

TE invasion dynamics underpins genome size expansions

The number of detected TEs was closely correlated with core genome size; hence, genome size expansions were at least partly caused by the very recent proliferation of TEs. Genome assemblies of large eukaryotic genomes based on short-read sequencing are often fragmented and contain chimeric sequences (Nagarajan and Pop, 2013). Focusing on the less repetitive core chromosomes in the genome of Z. tritici reduces such artifacts substantially. Because genome assemblies are the least complete in the most repetitive regions, any under-represented sequences may rather underestimate than overestimate within-species variation in genome size. Hence, we consider the assembly sizes to be a robust correlate of total genome size. The core genome size differences observed across the species range match genome size variation typically observed among closely related species. Among primates, genome size varies by ~70% with ~10% between humans and chimpanzees (Rogers and Gibbs, 2014; Miga et al., 2020). In fungi, genome size varies by several orders of magnitude within phyla but is often highly similar among closely related species (Raffaele and Kamoun, 2012). Interestingly, drastic changes in genome size have been observed in the Blumeria and Pseudocercospora genera where genome size changed by 35–130% between the closest known species (González-Sayer et al., 2021; Frantzeskakis et al., 2018). Beyond analyses of TE content variation correlating with genome size evolution, proximate mechanisms driving genome expansions are poorly understood. By establishing large population genetic datasets, such as those possible for crop pathogens, analyses of genome size evolution become tractable at the population level.

TEs might not only contribute to genome expansion directly by adding length through additional copies, but also by increasing the rate of chromosomal rearrangements and ectopic recombination (Bourque et al., 2018; Blommaert, 2020). However, TEs are not the only repetitive elements that can lead to a genome size expansion. In Arabidopsis thaliana genomes, the 45 S rDNA has been shown to have the strongest impact on genome size variation, followed by 5 S rDNA variation, and contributions by centromeric repeats and TEs (Long et al., 2013). In conjunction, recent work demonstrates how repetitive sequences are drivers of genome size evolution over short evolutionary timescales.

The activity of TEs is controlled by complex selection regimes within species. Actively transposing elements may accelerate genome evolution and underpin expansions. Hence, genomic defenses should evolve to efficiently target recently active TEs. Here, we show that TE activity and counteracting genomic defenses have established a tenuous equilibrium across the species range. We show that population subdivisions are at the origin of highly differentiated TE content within a species matching genome size changes emerging over the span of only decades and centuries. In conclusion, population-level analyses of genome size can recapitulate genome expansions typically observed across much deeper time scales providing fundamentally new insights into genome evolution.

Materials and methods

Fungal isolate collection and sequencing

We analyzed 295 Z. tritici isolates covering six populations originating from four geographic locations and four continents (Figure 1—source data 1), including: Middle East 1992 (n = 30 isolates, Nahal Oz, Israel), Australia 2001 (n = 27, Wagga Wagga), Europe 1999 (n = 33, Berg am Irchel, Switzerland), Europe 2016 (n = 52, Eschikon, ca. 15 km from Berg am Irchel, Switzerland), and North America 1990 and 2015 (n = 56 and n = 97, Willamette Valley, Oregon; McDonald et al., 1996; Linde et al., 2002; Zhan et al., 2002; Zhan et al., 2003; Zhan et al., 2005). Illumina short-read data from the Middle Eastern, Australian, European 1999, and North American 1990 populations were obtained from the NCBI Sequence Read Archive (SRA) under the BioProject PRJNA327615 (Hartmann et al., 2017). For the Switzerland 2016 and Oregon 2015 populations, asexual spores were harvested from infected wheat leaves from naturally infected fields and grown in YSB liquid media including 50 mgL^–1 kanamycin and stored in silica gel at −80°C. High-quality genomic DNA was extracted from liquid cultures using the DNeasy Plant Mini Kit from Qiagen (Venlo, The Netherlands). The isolates were sequenced on an Illumina HiSeq in paired-end mode and raw reads were deposited at the NCBI SRA under the BioProject PRJNA596434.

TE insertion detection

The quality of Illumina short reads was determined with FastQC version 0.11.5 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) (Figure 1A). To remove spuriously sequenced Illumina adaptors and low-quality reads, we trimmed the sequences with Trimmomatic version 0.36, using the following filter parameters: illuminaclip:TruSeq3-PE-2.fa:2:30:10 leading:10 trailing:10 slidingwindow:5:10 minlen:50 (Bolger et al., 2014). We created repeat consensus sequences for TE families (sequences are available on https://github.com/crolllab/datasets (copy archived at swh:1:rev:3647456a2f7ed986b690501d690f09d02d3f586e); Laboratory of Evolutionary Genetics @ UNINE, 2021; Figure 1—source data 2) in the complete reference genome IPO323 (Goodwin et al., 2011) with RepeatModeler version open-4.0.7 (http://www.repeatmasker.org/RepeatModeler/) based on the RepBase Sequence Database and de novo (Bao et al., 2015). TE classification into superfamilies and families was based on an approach combining detection of conserved protein sequences and tools to detect non-autonomous TEs (Badet et al., 2020). To detect TE insertions, we used the R-based tool ngs_te_mapper version 79ef861f1d52cdd08eb2d51f145223fad0b2363c integrated into the McClintock pipeline version 20cb912497394fabddcdaa175402adacf5130bd1, using bwa version 0.7.4-r385 to map Illumina short reads, samtools version 0.1.19 to convert alignment file formats and R version 3.2.3 (Li and Durbin, 2009; Li et al., 2009; Linheiro and Bergman, 2012; R Development Core Team, 2017; Nelson et al., 2017).

Downsampling analysis

We performed a downsampling analysis to estimate the sensitivity of the TE detection with ngs_te_mapper based on variation in read depth. We selected one isolate per population matching the average coverage of the population. We extracted the per-base pair read depth with the genomecov function of bedtools version 2.27.1 and calculated the genome-wide mean read depth (Quinlan and Hall, 2010). The number of reads in the original fastq file was reduced in steps of 10% to simulate the impact of reduced coverage. We analyzed each of the obtained reduced read subsets with ngs_te_mapper using the same parameters as described above. The correlation between the number of detected insertions and the read depth was visualized using the function nls with model SSlogis in R and visualized with ggplot2 (Wickham, 2016). The number of detected TEs increased with the number of reads until reaching a plateau indicating saturation (Figure 1B). Saturation was reached at a coverage of approximately 15×; hence, we retained only isolates with an average read depth above 15× for further analyses. We thus excluded one isolate from the Oregon 2015 population and 10 isolates from the Switzerland 2016 population.

Validation procedure for predicted TE insertions

ngs_te_mapper detects the presence but not the absence of a TE at any given locus. We devised additional validation steps to ascertain both the presence and the absence of a TE across all loci in all individuals. TEs absent in the reference genome were validated by re-analyzing mapped Illumina reads. Reads spanning both parts of a TE sequence and an adjacent chromosomal sequence should only map to the reference genome sequence and cover the target site duplication of the TE (Figure 1C). We used bowtie2 version 2.3.0 with the parameter --very-sensitive-local to map Illumina short reads of each isolate on the reference genome IPO323 (Langmead and Salzberg, 2012). Mapped Illumina short reads were then sorted and indexed with samtools, and the resulting bam file was converted to a bed file with the function bamtobed in bedtools. We extracted all mapped reads with an end point located within 100 bp of the target site duplication (Figure 1C). We tested whether the number of reads with a mapped end around the target site duplication significantly deviated if the mapping ended exactly at the boundary. A mapped read ending exactly at the target site duplication boundary is indicative of a split read mapping to a TE sequence absent in the reference genome. To test for the deviation in the number of read mappings around the target site duplication, we used a Poisson distribution and the ppois function in R version 3.5.1 (Figure 1C). We identified a TE as present in an isolate if tests on either side of the target site duplication had a p-value<0.001 (Figure 5—source data 1; Figure 1—figure supplement 1B and Figure 1—source data 1).

For TEs present in the reference genome, we analyzed evidence for spliced junction reads spanning the region containing the TE. Spliced reads are indicative of a discontinuous sequence and, hence, absence of the TE in a particular isolate (Figure 1D). We used STAR version 2.5.3a to detect spliced junction reads with the following set of parameters: --runThreadN 1 --outFilterMultimapNmax 100 --winAnchorMultimapNmax 200 --outSAMmultNmax 100 --outSAMtype BAM Unsorted --outFilterMismatchNmax 5 --alignIntronMin 150 --alignIntronMax 15,000 (Dobin et al., 2013). We then sorted and indexed the resulting bam file with samtools and converted split junction reads with the function bam2hints in bamtools version 2.5.1 (Barnett et al., 2011). We selected loci without overlapping spliced junction reads using the function intersect in bedtools with the parameter -loj -v. We considered a TE as truly absent in an isolate if ngs_te_mapper did not detect a TE and evidence for spliced junction reads were found, indicating that the isolate had no inserted TE in this region. If the absence of a TE could not be confirmed by spliced junction reads, we labeled the genotype as missing. Finally, we excluded TE loci with more than 20% missing data from further investigations (Figure 1D, Figure 1—figure supplement 1C).

Clustering of TE insertions into loci

We identified insertions across isolates as being the same locus if all detected TEs belonged to the same TE family and insertion sites differed by ≤100 bp (Figure 1—figure supplement 2). We used the R package GenomicRanges version 1.28.6 with the functions makeGRangesFromDataFrame and findOverlaps and the R package devtools version 1.13.4 (Lawrence et al., 2013; Wickham and Chang, 2016). We used the R package dplyr version 0.7.4 to summarize datasets (https://dplyr.tidyverse.org/). Population-specific frequencies of insertions were calculated with the function allele.count in the R package hierfstat version 0.4.22 (Goudet, 2005). We conducted a principal component analysis for TE insertion frequencies filtering for a minor allele frequency ≥5%. We also performed a principal component analysis for genome-wide single nucleotide polymorphism (SNP) data obtained from Hartmann et al., 2017 and Singh et al., 2020. As described previously, SNPs were hard-filtered with VariantFiltration and SelectVariants tools integrated in the Genome Analysis Toolkit (GATK) (McKenna et al., 2010). SNPs were removed if any of the following filter conditions applied: QUAL < 250; QD < 20.0; MQ < 30.0; –2 > BaseQRankSum > 2; –2 > MQRankSum > 2; –2 > ReadPosRankSum > 2; FS > 0.1. SNPs were excluded with vcftools version 0.1.17 and plink version 1.9 requiring a genotyping rate > 90% and a minor allele frequency > 5% (https://www.cog-genomics.org/plink2, Chang et al., 2015). Finally, we converted tri-allelic SNPs to bi-allelic SNPs by recoding the least frequent allele as a missing genotype. Principal component analysis was performed using the gdsfmt and SNPRelate packages in R (Zheng et al., 2012; Zheng et al., 2017). For a second principal component analysis with a reduced set of random markers, we randomly selected SNPs with vcftools and the following set of parameters: --maf 0.05 –thin 200,000 to obtain an approximately equivalent number of SNPs as TE loci.

Evaluation of singleton insertions

To evaluate the reliability of singleton TE insertion loci, we analyzed singleton loci in isolates for which we had both Illumina datasets and complete reference-quality genomes (Badet et al., 2020). From a set of 19 long-read PacBio reference genomes spanning the global distribution of Z. tritici, one isolate each from Australia, Israel, North America (1990) and four isolates from Europe (1999) were also included in the TE insertion screening. To assess the reliability of singleton TE insertions, we first investigated structural variation analyses among the reference genomes (Badet et al., 2021: Supplementary Data 1 and 2). The structural variation was called both based on split read mapping of PacBio reads and pairwise whole-genome alignments. Using bedtools intersect, we recovered for the 31 singleton TE loci in the seven analyzed genomes a total of 17 loci showing either an indel, translocation, copy number polymorphism, duplication, inverted duplication, inversion, or inverted translocation at the same location. We visually inspected the PacBio read alignment bam files against the IPO323 reference genome using IGV version 2.4.16 (Robinson et al., 2011), and found a typical coverage increase at the target site duplication, with most read mappings interrupted at the target site duplication as expected for an inserted TE. For the 14 remaining TE loci, we extracted the region of the predicted insertion and padded the sequence on both ends with an additional 500 bp using samtools faidx. We used blast to identify a homologous region in the assembled reference-quality genomes. Matching regions were inspected based on blastn for the presence of a TE sequence matching the TE family originally detected at the locus. With this second approach, we confirmed an additional five singletons to be true insertions. Both methods combined produced supportive evidence for 22 of 31 singleton insertions (71%). We calculated the read coverage after mapping to the reference genome IPO323 with bedtools genomecov for each PacBio long-read dataset and calculated mean coverage for 500 bp regions around singleton TE insertions.

Population differentiation in TE frequencies

We calculated Nei’s fixation index (F_ST) between pairs of populations using the R packages hierfstat and adegenet version 2.1.0 (Jombart, 2008; Jombart and Ahmed, 2011). To understand the chromosomal context of TE insertion loci across isolates, we analyzed draft genome assemblies. We generated de novo genome assemblies for all isolates using SPAdes version 3.5.0 with the parameter --careful and a kmer range of ‘21, 29, 37, 45, 53, 61, 79, 87’ (Bankevich et al., 2012). We used blastn to locate genes adjacent to TE insertion loci on genomic scaffolds of each isolate. We then extracted scaffold sequences surrounding 10 kb up- and downstream of the localized gene with the function faidx in samtools and reverse complemented the sequence if needed. Then, we performed multiple sequence alignments for each locus across all isolates with MAFFT version 7.407 with parameter --maxiterate 1000 (Katoh and Standley, 2013). We performed visual inspections to ensure correct alignments across isolates using Jalview version 2.10.5 (Waterhouse et al., 2009). To generate phylogenetic trees of individual gene or TE loci, we extracted specific sections of the alignment using the function extractalign in EMBOSS version 6.6.0 (Rice et al., 2000) and converted the multiple sequence alignment into PHYLIP format with jmodeltest version 2.1.10 using the -getPhylip parameter. We then estimated maximum likelihood phylogenetic trees with the software PhyML version 3.0, the K80 substitution model, and 100 bootstraps on the ATGC South of France bioinformatics platform (Guindon and Gascuel, 2003; Guindon et al., 2010; Darriba et al., 2012). Bifurcations with a supporting value lower than 10% were collapsed in TreeGraph version 2.15.0–887 beta, and trees were visualized as circular phylograms in Dendroscope version 2.7.4 (Huson et al., 2007; Stöver and Müller, 2010). For loci showing complex rearrangements, we generated synteny plots using 19 completely sequenced genomes from the same species using the R package genoplotR version 0.8.9 (Guy et al., 2010; Badet et al., 2020). We calculated the population-specific allele frequency for TE loci and estimated the exponential decay curve with a self-starting Nls asymptomatic regression model nls(p_loci ~ SSasymp(p_round, Asym, R0, lrc)) in R.

We analyzed signatures of selective sweeps based on genome-wide SNPs using the extended haplotype homozygosity (EHH) tests implemented in the R package REHH (Sabeti et al., 2007; Gautier and Vitalis, 2012). We analyzed within-population signatures based on the iHS statistic and chose a maximum gap distance of 20 kb. We also analyzed cross-population signatures based on the XP-EHH statistic for the following two population pairs: North America 1990 versus North America 2015, Europe 1999 versus Europe 2016. We defined significant selective sweeps as being among the 99.9th percentile outliers of the iHS and XP-EHH statistics. Significant SNPs at less than 5 kb were clustered into a single selective sweep region adding ±2.5 kb. Finally, we analyzed whether TE loci in the population pairs were within 10 kb of a region identified as a selective sweep by XP-EHH using the function intersect from bedtools.

Genomic location of TE insertions

To characterize the genomic environment of TE insertion loci, we split the reference genome into non-overlapping windows of 10 kb using the function splitter from EMBOSS. TEs were located in the reference genome using RepeatMasker providing consensus sequences from RepeatModeler (http://www.repeatmasker.org/). To analyze coding sequence, we retrieved the gene annotation for the reference genome (Grandaubert et al., 2015). We estimated the percentage covered by genes or TEs per window using the function intersect in bedtools. Additionally, we calculated the GC content using the tool get_gc_content (https://github.com/spundhir/RNA-Seq/blob/master/get_gc_content.pl; Sachin, 2021). We extracted the number of TEs present in 1 kb windows around each annotated core gene in the reference genome IPO323, using the function window in bedtools. We calculated the relative distances between each gene and the closest TE with the function bedtools closest. For the TEs inserted into genes, we used the function intersect in bedtools to distinguish intron and exon insertions with the parameters -wo and -v, respectively. TEs that overlap more than one exon were only counted once. For each 100 bp segment in the 1 kb windows as well as for introns and exons, we calculated the mean number of observed TE insertions per base pair. We calculated the mean number of TEs per window and calculated the log2 of the observed number of TE insertions divided by the expected value. We extracted information about recombination hotspots from Croll et al., 2015. This dataset is based on two experimental crosses initiated from isolates included in our analyses (1A5 × 1E4, 3D1 × 3D7). The recombination rates were assessed based on the reference genome IPO323 and analyzed with the R/qtl package in R. We used bedtools intersect to compare both TE density in IPO323 and TE insertion polymorphism with predicted recombination hotspots.

Core genome size estimation

Accessory chromosomes show the presence/absence variation within the species and length polymorphism (Goodwin et al., 2011; Croll et al., 2013) and thus impact genome size. We controlled for this effect by first mapping sequencing reads to the reference genome IPO323 using bowtie2 with --very-sensitive-local settings and retained only reads mapping to any of the 13 core chromosomes using seqtk subseq v1.3-r106 (https://github.com/lh3/seqtk/; Heng, 2021). Furthermore, we found that different sequencing runs showed minor variation in the distribution of the per read GC content. In particular, reads of a GC content lower than 30% were under-represented in the Australian (mean reads <30% of the total readset: 0.05%), North American 1990 (0.07%) and Middle East (0.1%) populations, and higher in the Europe 1999 (1.3%), North American 2015 (3.0%), and Europe 2016 (4.02%) populations (Figure 1—figure supplement 3). Library preparation protocols and Illumina sequencer generations are known factors influencing the recovery of reads of varying GC content (Benjamini and Speed, 2012).

To control a potential bias stemming from this, we subsampled reads based on GC content to create homogeneous datasets. For this, we first retrieved the mean GC content for each read pair using geecee in EMBOSS and binned reads according to GC content. For the bins with a GC content <30%, we calculated the mean proportion of reads from the genome over all samples. We then used seqtk subseq to subsample reads of <30% to adjust the mean GC content among readsets. We generated de novo genome assemblies using the SPAdes assembler version with the parameters --careful and a kmer range of ‘21, 29, 37, 45, 53, 61, 79, 87’. The SPAdes assembler is optimized for the assembly of relatively small eukaryotic genomes. We evaluated the completeness of the assemblies using BUSCO v4.1.1 with the fungi_odb10 gene test set (Simão et al., 2015). We finally ran Quast v5.0.2 to retrieve assembly metrics including scaffolds of at least 1 kb (Mikheenko et al., 2018).

Fungicide resistance assay

To quantify susceptibility toward propiconazole, we used a previously published microtiter plate assay dataset with three replicates performed for each isolate and concentration. Optical density was used to estimate growth rates under different fungicide concentrations (0, 0.00006, 0.00017, 0.0051, 0.0086, 0.015, 0.025, 0.042, 0.072, 0.20, 0.55, 1.5 mg L^–1) (Hartmann et al., 2020). We calculated dose–response curves and estimated the half-maximal lethal concentration EC₅₀ with a four-parameter logistics curve in the R package drc (Ritz and Streibig, 2005).

Acknowledgements

We thank Andrea Sánchez Vallet, Anne C Roulin, Luzia Stalder, Adam Taranto, Emilie Chanclud, and Alice Feurtey for helpful discussions and comments on previous versions of the manuscript. We also thank the three reviewers for very helpful suggestions. We thank C Sarai Reyes-Avila for advice on statistical analyses. DC is supported by the Swiss National Science (grants 31,003A_173265) and the Fondation Pierre Mercier pour la Science.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Daniel Croll, Email: daniel.croll@unine.ch.

Detlef Weigel, Max Planck Institute for Developmental Biology, Germany.

Funding Information

This paper was supported by the following grants:

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung 31003A_173265 to Daniel Croll.
Pierre Mercier pour la science to Daniel Croll.

Additional information

Competing interests

none.

None.

Author contributions

Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing - original draft.

Formal analysis.

Data curation, Supervision.

Data curation, Resources, Supervision.

Resources.

Investigation.

Resources.

Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing - original draft, Writing - review and editing.

Additional files

Transparent reporting form

elife-69249-transrepform1.docx^{(246.4KB, docx)}

Data availability

Sequence data are deposited at the NCBI Sequence Read Archive under the accession numbers PRJNA327615, PRJNA596434 and PRJNA178194. Transposable element consensus sequences are available from https://github.com/crolllab/datasets (copy archived at https://archive.softwareheritage.org/swh:1:rev:3647456a2f7ed986b690501d690f09d02d3f586e).

The following dataset was generated:

Hartmann FE, McDonald BA, Croll D. 2019. Population sequencing of Zymoseptoria tritici in Switzerland and Oregon (USA) NCBI BioProject. PRJNA596434

The following previously published datasets were used:

Hartmann FE, McDonald BA, Croll D. 2016. Population genomics of Zymoseptoria tritici. NCBI BioProject. PRJNA327615

Croll D, McDonald BA. 2012. Zymoseptoria tritici genome sequencing. NCBI BioProject. PRJNA178194

References

Ali SS, Khan M, Mullins E, Doohan FM. Identification of fusarium oxysporum genes associated with lignocellulose bioconversion competency. Bioenergy Research. 2013;7:110–119. doi: 10.1007/s12155-013-9353-0. [DOI] [Google Scholar]
Badet T, Oggenfuss U, Abraham L, McDonald BA, Croll D. A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici. BMC Biology. 2020;18:12. doi: 10.1186/s12915-020-0744-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Badet T, Fouché S, Hartmann FE, Zala M, Croll D. Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen. Nature Communications. 2021;12:3551. doi: 10.1038/s41467-021-23862-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. Spades: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bao W, Kojima KK, Kohany O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, Marth GT. Bamtools: A C++ API and Toolkit for Analyzing and Managing BAM Files. Bioinformatics. 2011;27:1691–1692. doi: 10.1093/bioinformatics/btr174. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baucom RS, Estill JC, Leebens-Mack J, Bennetzen JL. Natural selection on gene function drives the evolution of LTR retrotransposon families in the rice genome. Genome Research. 2009;19:243–254. doi: 10.1101/gr.083360.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Benjamini Y, Speed TP. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Research. 2012;40:1–14. doi: 10.1093/nar/gks001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blommaert J. Genome size evolution: Towards new model systems for old questions. Proceedings. Biological Sciences. 2020;287:20201441. doi: 10.1098/rspb.2020.1441. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bonifacino JS, Glick BS. The mechanisms of vesicle budding and fusion. Cell. 2004;116:153–166. doi: 10.1016/s0092-8674(03)01079-1. [DOI] [PubMed] [Google Scholar]
Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, Imbeault M, Izsvák Z, Levin HL, Macfarlan TS, Mager DL, Feschotte C. Ten things you should know about transposable elements. Genome Biology. 2018;19:199. doi: 10.1186/s13059-018-1577-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation plink: Rising to the challenge of larger and richer datasets. GigaScience. 2015;4:1–16. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Charlesworth B, Charlesworth D. The population dynamics of transposable elements. Genetical Research. 2009;42:1–27. doi: 10.1017/S0016672300021455. [DOI] [Google Scholar]
Chuong EB, Elde NC, Feschotte C. Regulatory activities of transposable elements: From conflicts to benefits. Nature Reviews. Genetics. 2017;18:71–86. doi: 10.1038/nrg.2016.139. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cridland JM, Macdonald SJ, Long AD, Thornton KR. Abundance and distribution of transposable elements in two Drosophila QTL mapping resources. Molecular Biology and Evolution. 2013;30:2311–2327. doi: 10.1093/molbev/mst129. [DOI] [PMC free article] [PubMed] [Google Scholar]
Croll D, Zala M, McDonald BA. Breakage-fusion-bridge cycles and large insertions contribute to the rapid evolution of accessory chromosomes in a fungal pathogen. PLOS Genetics. 2013;9:e1003567. doi: 10.1371/journal.pgen.1003567. [DOI] [PMC free article] [PubMed] [Google Scholar]
Croll D, Lendenmann MH, Stewart E, McDonald BA. The impact of recombination hotspots on genome evolution of a fungal plant pathogen. Genetics. 2015;201:1213–1228. doi: 10.1534/genetics.115.180968. [DOI] [PMC free article] [PubMed] [Google Scholar]
Darriba D, Taboada GL, Doallo R, Posada D. Jmodeltest 2: More models, new heuristics and parallel computing. Nature Methods. 2012;9:772. doi: 10.1038/nmeth.2109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Star: Ultrafast Universal RNA-SEQ aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eichler EE, Sankoff D. Structural dynamics of eukaryotic chromosome evolution. Science. 2003;301:793–797. doi: 10.1126/science.1086132. [DOI] [PubMed] [Google Scholar]
Estep LK, Torriani SFF, Zala M, Anderson NP, Flowers MD, McDonald BA, Mundt CC, Brunner PC. Emergence and early evolution of fungicide resistance in North American populations of Zymoseptoria tritici. Plant Pathology. 2015;64:961–971. doi: 10.1111/ppa.12314. [DOI] [Google Scholar]
Feschotte C. Transposable elements and the evolution of regulatory networks. Nature Reviews Genetics. 2008;9:397–405. doi: 10.1038/nrg2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
Feurtey A, Lorrain C, Croll D, Eschenbrenner C, Freitag M, Habig M, Haueisen J, Möller M, Schotanus K, Stukenbrock EH. Genome compartmentalization predates species divergence in the plant pathogen genus Zymoseptoria. BMC Genomics. 2020;21:588. doi: 10.1186/s12864-020-06871-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fouché S, Badet T, Oggenfuss U, Plissonneau C, Francisco CS, Croll D. Stress-driven transposable element de-repression dynamics in a fungal pathogen. Molecular Biology and Evolution. 2019;37:msz216. doi: 10.1093/molbev/msz216. [DOI] [PubMed] [Google Scholar]
Frantzeskakis L, Kracher B, Kusch S, Yoshikawa-Maekawa M, Bauer S, Pedersen C, Spanu PD, Maekawa T, Schulze-Lefert P, Panstruga R. Signatures of host specialization and a recent transposable element burst in the dynamic one-speed genome of the fungal barley powdery mildew pathogen. BMC Genomics. 2018;19:1–23. doi: 10.1186/s12864-018-4750-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fu Y, Kawabe A, Etcheverry M, Ito T, Toyoda A, Fujiyama A, Colot V, Tarutani Y, Kakutani T. Mobilization of a plant transposon by expression of the transposon-encoded anti-silencing factor. The EMBO Journal. 2013;32:2407–2417. doi: 10.1038/emboj.2013.169. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gautier M, Vitalis R. rehh: an R package to detect footprints of selection in genome-wide SNP data from haplotype structure. Bioinformatics. 2012;28:1176–1177. doi: 10.1093/bioinformatics/bts115. [DOI] [PubMed] [Google Scholar]
Gilly A, Etcheverry M, Madoui M-A, Guy J, Quadrana L, Alberti A, Martin A, Heitkam T, Engelen S, Labadie K, Le Pen J, Wincker P, Colot V, Aury J-M. Te-tracker: Systematic identification of transposition events through whole-genome resequencing. BMC Bioinformatics. 2014;15:377. doi: 10.1186/s12859-014-0377-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
González-Sayer S, Oggenfuss U, García I, Aristizabal F. High-quality genome assembly of pseudocercospora ulei the main threat to natural rubber trees. Genetics and Molecular Biology. 2021 doi: 10.1590/1678-4685-GMB-2021-0051. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goodwin SB, M’barek SB, Dhillon B, Wittenberg AHJ, Crane CF, Hane JK, Foster AJ, Van der Lee TAJ, Grimwood J, Aerts A, Antoniw J, Bailey A, Bluhm B, Bowler J, Bristow J, van der Burgt A, Canto-Canché B, Churchill ACL, Conde-Ferràez L, Cools HJ, Coutinho PM, Csukai M, Dehal P, De Wit P, Donzelli B, van de Geest HC, van Ham RCHJ, Hammond-Kosack KE, Henrissat B, Kilian A, Kobayashi AK, Koopmann E, Kourmpetis Y, Kuzniar A, Lindquist E, Lombard V, Maliepaard C, Martins N, Mehrabi R, Nap JPH, Ponomarenko A, Rudd JJ, Salamov A, Schmutz J, Schouten HJ, Shapiro H, Stergiopoulos I, Torriani SFF, Tu H, de Vries RP, Waalwijk C, Ware SB, Wiebenga A, Zwiers L-H, Oliver RP, Grigoriev IV, Kema GHJ. Finished genome of the fungal wheat pathogen Mycosphaerella graminicola reveals dispensome structure, chromosome plasticity, and stealth pathogenesis. PLOS Genetics. 2011;7:e1002070. doi: 10.1371/journal.pgen.1002070. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goudet J. HierFSTAT, a package for R to compute and test hierarchical f-statistics. Molecular Ecology Notes. 2005;5:184–186. doi: 10.1111/j.1471-8286.2004.00828.x. [DOI] [Google Scholar]
Grandaubert J, Bhattacharyya A, Stukenbrock EH. RNA-seq-based gene annotation and comparative genomics of four fungal grass pathogens in the genus zymoseptoria identify novel orphan genes and species-specific invasions of transposable elements. G3: Genes, Genomes, Genetics. 2015;5:1323–1333. doi: 10.1534/g3.115.017731. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology. 2003;52:696–704. doi: 10.1080/10635150390235520. [DOI] [PubMed] [Google Scholar]
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of Phyml 3.0. Systematic Biology. 2010;59:307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
Guy L, Kultima JR, Andersson SGE. GENOPLOTR: Comparative gene and genome Visualization in R. Bioinformatics. 2010;26:2334–2335. doi: 10.1093/bioinformatics/btq413. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hartmann FE, Croll D. Distinct trajectories of massive recent gene gains and losses in populations of a microbial eukaryotic pathogen. Molecular Biology and Evolution. 2017;34:2808–2822. doi: 10.1093/molbev/msx208. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hartmann FE, Sánchez-Vallet A, McDonald BA, Croll D. A fungal wheat pathogen evolved host specialization by extensive chromosomal rearrangements. The ISME Journal. 2017;11:1189–1204. doi: 10.1038/ismej.2016.196. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hartmann FE, McDonald BA, Croll D. Genome-wide evidence for divergent selection between populations of a major agricultural pathogen. Molecular Ecology. 2018;27:2725–2741. doi: 10.1111/mec.14711. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hartmann FE, Vonlanthen T, Singh NK, McDonald MC, Milgate A, Croll D. The Complex Genomic Basis of Rapid Convergent Adaptation to Pesticides across Continents in a Fungal Plant Pathogen. Molecular Ecology; 2020. [DOI] [PubMed] [Google Scholar]
Heng L. SEQTK. 7c04ce7Github. 2021 https://github.com/lh3/seqtk/
Hollister JD, Gaut BS. Epigenetic silencing of transposable elements: A trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Research. 2009;19:1419–1428. doi: 10.1101/gr.091678.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R. Dendroscope: An interactive viewer for large phylogenetic trees. BMC Bioinformatics. 2007;8:1–6. doi: 10.1186/1471-2105-8-460. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiao WB, Schneeberger K. Chromosome-Level Assemblies of Multiple Arabidopsis thaliana Accessions Reveal Hotspots of Genomic Rearrangements. bioRxiv. 2019 doi: 10.1101/738880. [DOI] [PMC free article] [PubMed]
Jombart T. Adegenet: A r package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24:1403–1405. doi: 10.1093/bioinformatics/btn129. [DOI] [PubMed] [Google Scholar]
Jombart T, Ahmed I. ADEGENET 1.3-1: New tools for the analysis of genome-wide SNP data. Bioinformatics. 2011;27:3070–3071. doi: 10.1093/bioinformatics/btr521. [DOI] [PMC free article] [PubMed] [Google Scholar]
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kidwell MG. Transposable elements and the evolution of genome size in eukaryotes. Genetica. 2002;115:49–63. doi: 10.1023/a:1016072014259. [DOI] [PubMed] [Google Scholar]
Krishnan P, Meile L, Plissonneau C, Ma X, Hartmann FE, Croll D, McDonald BA, Sánchez-Vallet A. Transposable element insertions shape gene regulation and melanin production in a fungal pathogen of wheat. BMC Biology. 2018;16:1–18. doi: 10.1186/s12915-018-0543-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Laboratory of Evolutionary Genetics @ UNINE Datasets. swh:1:rev:3647456a2f7ed986b690501d690f09d02d3f586eSoftware Heritage. 2021 https://archive.softwareheritage.org/swh:1:rev:3647456a2f7ed986b690501d690f09d02d3f586e
Lai X, Schnable JC, Liao Z, Xu J, Zhang G, Li C, Hu E, Rong T, Xu Y, Lu Y. Genome-wide characterization of non-reference transposable element insertion polymorphisms reveals genetic diversity in tropical and temperate maize. BMC Genomics. 2017;18:1–13. doi: 10.1186/s12864-017-4103-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan MT, Carey VJ. Software for computing and annotating genomic ranges. PLOS Computational Biology. 2013;9:e1003118. doi: 10.1371/journal.pcbi.1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Durbin R. Fast and accurate short read alignment with Burrows-wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The sequence Alignment/map format and samtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liang X, Wang B, Dong Q, Li L, Rollins JA, Zhang R, Sun G. Pathogenic adaptations of colletotrichum fungi revealed by genome wide gene family evolutionary analyses. PLOS ONE. 2018;13:e0196303. doi: 10.1371/journal.pone.0196303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lim JK. Intrachromosomal rearrangements mediated by hobo transposons in Drosophila melanogaster. PNAS. 1988;85:9153–9157. doi: 10.1073/pnas.85.23.9153. [DOI] [PMC free article] [PubMed] [Google Scholar]
Linde CC, Zhan J, McDonald BA. Population Structure of Mycosphaerella graminicola: From Lesions to Continents. Phytopathology. 2002;92:946–955. doi: 10.1094/PHYTO.2002.92.9.946. [DOI] [PubMed] [Google Scholar]
Linheiro RS, Bergman CM. Whole genome resequencing reveals natural target site preferences of transposable elements in Drosophila melanogaster. PLOS ONE. 2012;7:e30008. doi: 10.1371/journal.pone.0030008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A, Zhang Q, Vilhjálmsson BJ, Korte A, Nizhynska V, Voronin V, Korte P, Sedman L, Mandáková T, Lysak MA, Seren Ü, Hellmann I, Nordborg M. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nature Genetics. 2013;45:884–890. doi: 10.1038/ng.2678. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu L, Chen J, Robb SMC, Okumoto Y, Stajich JE, Wessler SR. Tracking the genome-wide outcomes of a transposable element burst over decades of amplification. PNAS. 2017;114:E10550–E10559. doi: 10.1073/pnas.1716459114. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luo S, Zhang H, Duan Y, Yao X, Clark AG, Lu J. The evolutionary arms race between transposable elements and pirnas in Drosophila melanogaster. BMC Evolutionary Biology. 2020;20:14. doi: 10.1186/s12862-020-1580-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lynch M. The Origins of Genome Architecture. Sunderland MA: Sinauer Associates; 2007. [Google Scholar]
McDonald BA, Mundt CC, Chen R-S. The role of selection on the genetic structure of pathogen populations: Evidence from field experiments with Mycosphaerella graminicola on wheat. Euphytica. 1996;92:73–80. doi: 10.1007/BF00022831. [DOI] [Google Scholar]
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis Toolkit: A mapreduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meile L, Croll D, Brunner PC, Plissonneau C, Hartmann FE, McDonald BA, Sánchez-Vallet A. A fungal avirulence factor encoded in a highly plastic genomic region triggers partial resistance to septoria tritici blotch. The New Phytologist. 2018;219:1048–1061. doi: 10.1111/nph.15180. [DOI] [PMC free article] [PubMed] [Google Scholar]
Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, Brooks S, Howe E, Porubsky D, Logsdon GA, Schneider VA, Potapova T, Wood J, Chow W, Armstrong J, Fredrickson J, Pak E, Tigyi K, Kremitzki M, Markovic C, Maduro V, Dutra A, Bouffard GG, Chang AM, Hansen NF, Wilfert AB, Thibaud-Nissen F, Schmitt AD, Belton J-M, Selvaraj S, Dennis MY, Soto DC, Sahasrabudhe R, Kaya G, Quick J, Loman NJ, Holmes N, Loose M, Surti U, Risques RA, Graves Lindsay TA, Fulton R, Hall I, Paten B, Howe K, Timp W, Young A, Mullikin JC, Pevzner PA, Gerton JL, Sullivan BA, Eichler EE, Phillippy AM. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020;585:79–84. doi: 10.1038/s41586-020-2547-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018;34:i142–i150. doi: 10.1093/bioinformatics/bty266. [DOI] [PMC free article] [PubMed] [Google Scholar]
Miyao A, Tanaka K, Murata K, Sawaki H, Takeda S, Abe K, Shinozuka Y, Onosato K, Hirochika H. Target site specificity of the tos17 retrotransposon shows a preference for insertion within genes and against insertion in retrotransposon-rich regions of the genome. The Plant Cell. 2003;15:1771–1780. doi: 10.1105/tpc.012559. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nagarajan N, Pop M. Sequence assembly demystified. Nature Reviews. Genetics. 2013;14:157–167. doi: 10.1038/nrg3367. [DOI] [PubMed] [Google Scholar]
Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H, Altaf-Ul-Amin M, Ogasawara N, Kanaya S. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Research. 2011;39:e90. doi: 10.1093/nar/gkr344. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nelson MG, Linheiro RS, Bergman CM. Mcclintock: An integrated pipeline for detecting transposable element insertions in whole-genome shotgun sequencing data. G3: Genes, Genomes, Genetics. 2017;7:2763–2778. doi: 10.1534/g3.117.043893. [DOI] [PMC free article] [PubMed] [Google Scholar]
Oliver KR, McComb JA, Greene WK. Transposable elements: Powerful contributors to angiosperm evolution and diversity. Genome Biology and Evolution. 2013;5:1886–1901. doi: 10.1093/gbe/evt141. [DOI] [PMC free article] [PubMed] [Google Scholar]
Omrane S, Sghyer H, Audéon C, Lanen C, Duplaix C, Walker A-S, Fillinger S. Fungicide efflux and the mgmfs1 transporter contribute to the multidrug resistance phenotype in Zymoseptoria tritici Field isolates. Environmental Microbiology. 2015;17:2805–2823. doi: 10.1111/1462-2920.12781. [DOI] [PubMed] [Google Scholar]
Omrane S, Audéon C, Ignace A, Duplaix C, Aouini L, Kema G, Walker AS, Fillinger S. Plasticity of the MFS1 Promoter Leads to Multidrug Resistance in the Wheat Pathogen Zymoseptoria tritici. MSphere. 2017;2:1–42. doi: 10.1128/mSphere.00393-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peter M, Kohler A, Ohm RA, Kuo A, Krützmann J, Morin E, Arend M, Barry KW, Binder M, Choi C, Clum A, Copeland A, Grisel N, Haridas S, Kipfer T, LaButti K, Lindquist E, Lipzen A, Maire R, Meier B, Mihaltcheva S, Molinier V, Murat C, Pöggeler S, Quandt CA, Sperisen C, Tritt A, Tisserant E, Crous PW, Henrissat B, Nehls U, Egli S, Spatafora JW, Grigoriev IV, Martin FM. Ectomycorrhizal ecology is imprinted in the genome of the dominant symbiotic fungus cenococcum Geophilum. Nature Communications. 2016;7:1–15. doi: 10.1038/ncomms12662. [DOI] [PMC free article] [PubMed] [Google Scholar]
Petrov DA, Aminetzach YT, Davis JC, Bensasson D, Hirsh AE. Size matters: Non-ltr retrotransposable elements and ectopic recombination in Drosophila. Molecular Biology and Evolution. 2003;20:880–892. doi: 10.1093/molbev/msg102. [DOI] [PubMed] [Google Scholar]
Piegu B, Guyot R, Picault N, Roulin A, Sanyal A, Saniyal A, Kim H, Collura K, Brar DS, Jackson S, Wing RA, Panaud O. Doubling genome size without polyploidization: Dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Research. 2006;16:1262–1269. doi: 10.1101/gr.5290206. [DOI] [PMC free article] [PubMed] [Google Scholar]
Plissonneau C, Stürchler A, Croll D. The evolution of orphan regions in genomes of a fungal pathogen of wheat. MBio. 2016;7:e01231-16. doi: 10.1128/mBio.01231-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
Quadrana L, Bortolini Silveira A, Mayhew GF, LeBlanc C, Martienssen RA, Jeddeloh JA, Colot V. The Arabidopsis thaliana mobilome and its impact at the species level. eLife. 2016;5:e15716. doi: 10.7554/eLife.15716. [DOI] [PMC free article] [PubMed] [Google Scholar]
Quinlan AR, Hall IM. Bedtools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
R Development Core Team . Vienna, Austria: R Foundation for Statistical Computing; 2017. http://www.r-project.org [Google Scholar]
Raffaele S, Kamoun S. Genome evolution in filamentous plant pathogens: Why bigger can be better. Nature Reviews. Microbiology. 2012;10:417–430. doi: 10.1038/nrmicro2790. [DOI] [PubMed] [Google Scholar]
Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology open Software Suite. Trends in Genetics. 2000;16:276–277. doi: 10.1016/s0168-9525(00)02024-2. [DOI] [PubMed] [Google Scholar]
Ritz C, Streibig JC. Bioassay analysis using R. Journal of Statistical Software. 2005;12:1–22. doi: 10.18637/jss.v012.i05. [DOI] [Google Scholar]
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative Genome Viewer. Nature Biotechnology. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rogers PD, Barker KS. Genome-wide expression profile analysis reveals coordinately regulated genes associated with stepwise acquisition of azole resistance in Candida albicans clinical isolates. Antimicrobial Agents and Chemotherapy. 2003;47:1220–1227. doi: 10.1128/AAC.47.4.1220-1227.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rogers J, Gibbs RA. Comparative primate genomics: emerging patterns of genome content and dynamics. Nature Reviews. Genetics. 2014;15:347–359. doi: 10.1038/nrg3707. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rouxel T, Grandaubert J, Hane JK, Hoede C, van de Wouw AP, Couloux A, Dominguez V, Anthouard V, Bally P, Bourras S, Cozijnsen AJ, Ciuffetti LM, Degrave A, Dilmaghani A, Duret L, Fudal I, Goodwin SB, Gout L, Glaser N, Linglin J, Kema GHJ, Lapalu N, Lawrence CB, May K, Meyer M, Ollivier B, Poulain J, Schoch CL, Simon A, Spatafora JW, Stachowiak A, Turgeon BG, Tyler BM, Vincent D, Weissenbach J, Amselem J, Quesneville H, Oliver RP, Wincker P, Balesdent M-H, Howlett BJ. Effector diversification within compartments of the Leptosphaeria maculans genome affected by repeat-induced point mutations. Nature Communications. 2011;2:202. doi: 10.1038/ncomms1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, International HapMap Consortium. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MMY, Tsui SKW, Xue H, Wong JT-F, Galver LM, Fan J-B, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier J-F, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok P-Y, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui L-C, Mak W, Song YQ, Tam PKH, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PIW, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe’er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Johnson TA, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CDM, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L’Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449:913–918. doi: 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sachin P. RNA-Seq. 3e1c4b1GitHub. 2021 https://github.com/spundhir/RNA-Seq/blob/main/get_gc_content.pl
SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nature Genetics. 1998;20:43–45. doi: 10.1038/1695. [DOI] [PubMed] [Google Scholar]
Shen MR, Batzer MA, Deininger PL. Evolution of the master Alu gene(s. Journal of Molecular Evolution. 1991;33:311–320. doi: 10.1007/BF02102862. [DOI] [PubMed] [Google Scholar]
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
Singh NK, Chanclud E, Croll D. Population-Level deep sequencing reveals the interplay of clonal and sexual reproduction in the fungal wheat pathogen Zymoseptoria tritici. bioRxiv. 2020 doi: 10.1101/2020.07.07.191510. [DOI] [PMC free article] [PubMed]
Sirisattha S, Momose Y, Kitagawa E, Iwahashi H. Toxicity of anionic detergents determined by Saccharomyces cerevisiae microarray analysis. Water Research. 2004;38:61–70. doi: 10.1016/j.watres.2003.08.027. [DOI] [PubMed] [Google Scholar]
Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nature Reviews. Genetics. 2007;8:272–285. doi: 10.1038/nrg2072. [DOI] [PubMed] [Google Scholar]
Soustre I, Letourneux Y, Karst F. Characterization of the Saccharomyces cerevisiae RTA1 gene involved in 7-aminocholesterol resistance. Current Genetics. 1996;30:121–125. doi: 10.1007/s002940050110. [DOI] [PubMed] [Google Scholar]
Stöver BC, Müller KF. Treegraph 2: Combining and visualizing evidence from different phylogenetic analyses. BMC Bioinformatics. 2010;11:1–9. doi: 10.1186/1471-2105-11-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stritt C, Gordon SP, Wicker T, Vogel JP, Roulin AC. Recent activity in expanding populations and purifying selection have shaped transposable element landscapes across natural accessions of the Mediterranean grass brachypodium distachyon. Genome Biology and Evolution. 2018;10:304–318. doi: 10.1093/gbe/evx276. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stuart T, Eichten SR, Cahn J, Karpievitch YV, Borevitz JO, Lister R. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. eLife. 2016;5:e20777. doi: 10.7554/eLife.20777. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stukenbrock EH, Banke S, Javan-Nikkhah M, McDonald BA. Origin and domestication of the fungal wheat pathogen Mycosphaerella graminicola via sympatric speciation. Molecular Biology and Evolution. 2007;24:398–411. doi: 10.1093/molbev/msl169. [DOI] [PubMed] [Google Scholar]
Stukenbrock EH, Bataillon T, Dutheil JY, Hansen TT, Li R, Zala M, McDonald BA, Wang J, Schierup MH. The making of a new pathogen: Insights from comparative population genomics of the domesticated wheat pathogen Mycosphaerella graminicola and its wild sister species. Genome Research. 2011;21:2157–2166. doi: 10.1101/gr.118851.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Torriani SFF, Melichar JPE, Mills C, Pain N, Sierotzki H, Courbot M. Zymoseptoria tritici: A major threat to wheat production, integrated approaches to control. Fungal Genetics and Biology. 2015;79:8–12. doi: 10.1016/j.fgb.2015.04.010. [DOI] [PubMed] [Google Scholar]
Walser JC, Chen B, Feder ME. Heat-shock promoters: Targets for evolution by P transposable elements in Drosophila. PLOS Genetics. 2006;2:e165. doi: 10.1371/journal.pgen.0020165. [DOI] [PMC free article] [PubMed] [Google Scholar]
Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–1191. doi: 10.1093/bioinformatics/btp033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wickham H. Ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016. [DOI] [Google Scholar]
Wickham H, Chang W. Devtools; 2016. https://CRAN.R-project.org/package=devtools [Google Scholar]
Wong WY, Simakov O, Bridge DM, Cartwright P, Bellantuono AJ, Kuhn A, Holstein TW, David CN, Steele RE, Martínez DE. Expansion of a single transposable element family is associated with genome-size increase and radiation in the genus Hydra. PNAS. 2019;116:22915–22917. doi: 10.1073/pnas.1910106116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yew SM, Chan CL, Kuan CS, Toh YF, Ngeow YF, Na SL, Lee KW, Hoh CC, Yee WY, Ng KP. The genome of newly classified Ochroconis mirabilis: Insights into fungal adaptation to different living conditions. BMC Genomics. 2016;17:1–17. doi: 10.1186/s12864-016-2409-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhan J, Kema GHJ, Waalwijk C, McDonald BA. Distribution of mating type alleles in the wheat pathogen Mycosphaerella graminicola over spatial scales from lesions to continents. Fungal Genetics and Biology. 2002;36:128–136. doi: 10.1016/S1087-1845(02)00013-0. [DOI] [PubMed] [Google Scholar]
Zhan J, Pettway RE, McDonald BA. The global genetic structure of the wheat pathogen Mycosphaerella graminicola is characterized by high nuclear diversity, low mitochondrial diversity, regular recombination, and gene flow. Fungal Genetics and Biology. 2003;38:286–297. doi: 10.1016/s1087-1845(02)00538-8. [DOI] [PubMed] [Google Scholar]
Zhan J, Linde CC, Jürgens T, Merz U, Steinebrunner F, McDonald BA. Variation for neutral markers is correlated with variation for quantitative traits in the plant pathogenic fungus Mycosphaerella graminicola. Molecular Ecology. 2005;14:2683–2693. doi: 10.1111/j.1365-294X.2005.02638.x. [DOI] [PubMed] [Google Scholar]
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–3328. doi: 10.1093/bioinformatics/bts606. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng X, Gogarten SM, Lawrence M, Stilp A, Conomos MP, Weir BS, Laurie C, Levine D. Seqarray-a storage-efficient high-performance data format for WGS variant calls. Bioinformatics. 2017;33:2251–2257. doi: 10.1093/bioinformatics/btx145. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.69249.sa1

Decision letter

Editor: Detlef Weigel¹

Reviewed by: Marie Mirouze², Zoé Joly-Lopez³, Leandro Quadrana⁴

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

This study will appeal to a broad audience in evolutionary and population genomics. The work reveals how demographic processes have shaped transposable element dynamics in an important fungal pathogen. Many of the conclusions are likely to apply to other taxa as well.

Decision letter after peer review:

Thank you for submitting your article "A population-level invasion by transposable elements triggers genome expansion in a fungal pathogen" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Detlef Weigel as the Senior and Reviewing Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Marie Mirouze (Reviewer #1); Zoé Joly-Lopez (Reviewer #2); Leandro Quadrana (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Since calling TE insertion polymorphisms is an inherently difficult business, we strongly encourage you to validate a select number of polymorphic TE insertions using either a long read dataset or conventional molecular biology, focusing on singleton TEs, i.e., those present only in one strain. The reviewers provided several additional suggestions to exclude artifacts as much as possible.

2) Link TE density and insertion polymorphisms to local recombination rates.

3) Add statistical analyses to support qualitative statements.

4) There was confusion about the sweep scans. Please spell out in the text that TEs were associated with sweeps after these had been identified based on SNPs (at least this is my understanding of the methods; if this is wrong, please consider including sweep scans with SNPs and indels). Along these lines, it might be worth saying something about using TE insertion polymorphisms directly in sweep scans.

5) Address the apparent discrepancies between figure 4 and supplementary figure 12 and the attendant conclusions.

The reviewers made a series of additional excellent suggestions, which we ask you to pay attention to.

Reviewer #1 (Recommendations for the authors):

I enjoyed reading the paper. Here are a few points that could help clarifying your findings:

– It would be nice to validate some of the TIPS using either classical PCR or long read data.

– Most families have relatively low copy numbers (Figure S6). Is the increase in copy number sufficient to explain the genome size increase (given that a TE might be generally smaller than 10 kb on average)?

– It could be interesting to highlight what figures are based on core chromosomes and what figures include all chromosomes, as for genomicists not familiar with fungal genomes this distinction would not be evident. This is stated on line 306 but could be explicit for each figure.

– Line 283: The authors have already published the complete genomes, could they be more specific on the difference between the two datasets? Is the 2020 paper focused on one "reference" genome per geographic origin?

– Line 352: Is MFS1 corresponding to the MFS gene on Figure 4B? If so, what is the MSF1 expression level in the US genomes (or in the TE-present versus TE-absent individuals ?)– Figure 1D ii is not clear to me.– Figures S6 and S12 are very interesting and could be moved to main figures (if no space constraint).

– It would be interesting to see Figure 3a with the color-code for RLG vs RLC and DHH TE classes.

– It seems that the 5 selected candidate adaptive TEs (line 204) are not comprised in the 5 ones found in selective sweep regions (line 190). If so, could the 5 ones on selective sweep regions be cited in the text maybe?

–An example of TE burst within Oryza could be cited (Piegu et al., 2006, see Line 74).

Reviewer #2 (Recommendations for the authors):

While the type of dataset that the authors have can lead to interesting hypothesis testing for TE dynamics, the manuscript lacked clarity between the hypothesis to be tested, the results, and the link to the figures, which makes it unfortunately difficult to measure the impact of the findings.

1. The fungicide assay is used to test for potential TE insertions conferring fungicide resistance. The authors do not mention which insertion loci are being tested and the legends in the supplementary Figure 12 does not provide details. Counting 26 loci helped to then make the probable link with the 26 candidate loci with local adaptation in the North American populations, but then the link with the five TE insertion in proximity to genes (line 204) is not clear. Are the 3 significant loci part of the five described in Figure 4 and supplemental Figures 8-11? How do the authors reconcile these results with the fact that the Helitron insertion highlighted in Figure 4 is not significant in the fungicide assay (by trying to read the insertion names in Supp. Figure S12). Finally, the Methods section lacks key details as to the number of replicates for the assay and statistical analyses.

2.The authors looked at the frequency spectrum of TEs in the 6 populations to see if they correlated with potential population-specific expansions. This would provide a good indicator about the type of selection acting on these populations. The panel 6D does not support the conclusions. It is hard to understand the bar graph and it is unclear to the reader how the authors are using the frequency bins to conclude the shift. Based on the changes of the color bars, the results for the North American population do not seem to be supported. Unless there was an issue with the labeling of the figure, or that additional information in the legends should be provided to make the results stand out.

3. I may not have understood, but the authors found five loci overlapping selective sweep regions and mention that they are retrotransposons. Then the story changes its focus on 5 insertions loci in proximity to genes associated with fungicide resistance or host adaptation. It takes a good part of the paragraph to understand which insertion refers to the ones overlapping the sweep. I would have liked a more detailed description of these selective sweep region, especially regarding the regions not being associated with the fungicide resistance genes.

4. The authors mention migration bottleneck, which I assume they refer to population bottleneck caused by migration. Would it be possible to provide references to support this assumption?

5. I would suggest that the authors carefully review the figures and the corresponding legends. For example:

– Figure 2. Modify order so that the panels are called in the proper order. In A, green triangle not defined, 2C, hard to see the red vs grey line, 2E hard to distinguish among the colors, like RIX and RLG gypsy and provide legends for all dot sizes, 2F: no hexagon, do we mean arrow?

– Figure 2F: not clear where the singleton are indicated or not, but called for in the manuscript. What does "more than two copies in the populations" mean? I would assume it is two or more (not more than two) but it is not clear whether it is more than two copies per population or at least one copy in multiple populations.

– Figure 3. Dots present next to some of the chromosome numbers and it is not explained in the legends why.

– Line 179: Was it intended to say European Population or really the Australian? If so, justify why.

– Some of the Figure panels are not called in the main text (only in methods). For example only Figure 1A is mentioned in the text. I would like to see the panels B, C, D being integrated in the text to clarify how the pipeline works and the validation.

– Figure 4G in its current state does not support the conclusion that it is only found in North American populations as only the Middle-East population is compared, which was probably chosen since it represents the origin of the pathogen.

Reviewer #3 (Recommendations for the authors):

Line 166-171. I do not fully understand the analysis of the data presented in Figure 2A. It is not clear from the results and methods section how the expected distribution of TE insertions was obtained. In addition, it seems that TE insertions located very close of two genes may be counted twice, introducing potential biases in the analysis (i.e. insertions in gene-rich and gene-poor regions are counted twice or once, respectively). Independently of how this analysis was performed, I disagree with your interpretation of the results based only on selective forces, as the localization of TE insertions is the results of integration preferences and subsequent selective biases. For instance, it has been established that COPIA and MuDR transposons integrate preferentially within or upstream genes in multiple species. The effect of integration preference is particularly important in this article, as most TE insertion polymorphisms detected singletons that are expected to be the result of very recent transposition events unfiltered by natural selection.

Line 205-213 The increased frequency of the helitron insertion at MFS could be the consequence of either positive selection of the TE insertion, a linked polymorphism (genetic hitchhiking) or else background selection on the haplotype. Better evidence for positive selection on the TE insertion itself could be obtained by scanning the region for selective sweeps based on SNPs and polymorphic TEs and using site frequency spectrum-based (E.g Fst) and/or haplotype length-based (e.g. iHS). Selective sweeps signatures centered on the TE insertion would support selection on this variant.

Line 225-227. The association between TE presence and azole resistance is tantalizing, but cannot imply causality. As pointed before, the TE insertions may be simply hitchhiking a causal mutation. Moreover, I cannot find in the article any result supporting that the TE insertions associated with azole resistance also show signatures of positive selection. Indeed, it seems that none of the insertion discussed at length in Lines 205-223 show associations in Suppl. Figure 12. This is a key point of the article because without this connection it is difficult to draw any conclusion about potential benefits associated with TE insertions in Zymoseptoria tritici.

Line 354-357. As I mentioned above, I don't see any correspondence between signatures of positive selection and azole resistance for the Gypsy (RLG_Sol) insertion near the MFS gene in Chr 12 (Figure S10). Indeed, this insertion is not marked as statistically significant in Figure S11.

Figure 6D. The reduction observed in the fraction of loci with low allele frequency (10%) in North American isolates is not accompanied by an increase in the proportion of loci at high allele frequency. In other words, it seems as if the bars do not sum up 1. Please check/clarify this figure.

Work in Arabidopsis demonstrated that intraspecific genome size variation is largely explained by copy number variation in 45S rRNA (Long et al. 2013 Nat. Genet). This work should be discussed in relation to your observations, particularly given that the de novo assembly of genomes using short-reads is unable to estimate copy number variation of rRNA or centromeres.

eLife. 2021 Sep 16;10:e69249. doi: 10.7554/eLife.69249.sa2

Author response

Essential revisions:

1) Since calling TE insertion polymorphisms is an inherently difficult business, we strongly encourage you to validate a select number of polymorphic TE insertions using either a long read dataset or conventional molecular biology, focusing on singleton TEs, i.e., those present only in one strain. The reviewers provided several additional suggestions to exclude artifacts as much as possible.

We entirely agree that singleton calls in particular can benefit from additional validation (note that our procedure had already multiple validation steps beyond the standard ngs_te_mapper pipeline). Following the recommendations below, we have now taken advantage of PacBio long-read datasets to specifically screen for singleton candidates. We had in total access to seven PacBio high-coverage sequencing sets overlapping with isolates included in the previous version of the manuscript. We now report the outcome of the seven isolates in detail in the manuscript (methods/results/suppl. figures/tables). In short, we could confirm ~70% of singletons (or 22 out of 31 singletons) in all tested isolates combined. A more thorough explanation of our approach and findings is also provided below.

2) Link TE density and insertion polymorphisms to local recombination rates.

This is an excellent suggestion. We do have access to high-resolution recombination maps for the species and we integrated this now in our analyses. Overall, we found that the TE density in the reference genome (that includes also older and nested insertions) shows no difference between genome-wide and recombination hotspots. We did detect a trend towards higher TE numbers outside of recombination hotspots (adjusted for the breadth of hotspots). We mention this now in the Results and added additional panels to figure supplements.

3) Add statistical analyses to support qualitative statements.

We have carefully checked our statements and added statistical significance testing where appropriate.

4) There was confusion about the sweep scans. Please spell out in the text that TEs were associated with sweeps after these had been identified based on SNPs (at least this is my understanding of the methods; if this is wrong, please consider including sweep scans with SNPs and indels). Along these lines, it might be worth saying something about using TE insertion polymorphisms directly in sweep scans.

We now much more clearly spell out that the selection scans were based on genome-wide SNP sets (not directly TE loci). As requested, we also explain in more detail what genes / TE insertion loci were found in sweep regions. We also explain now that including TE insertion loci directly into the scans maybe problematic due to the uneven coverage and heterogeneity in linkage disequilibrium. We do think though that the population differentiation scans of TE insert frequencies provide relevant information about candidate TEs involved in adaptation.

5) Address the apparent discrepancies between figure 4 and supplementary figure 12 and the attendant conclusions.

We now more carefully explain the contents of these two figures. We replaced the candidate adaptive TE locus to increase coherence. According to a comment below, we added parts of the figure supplement to Figure 4.

The reviewers made a series of additional excellent suggestions, which we ask you to pay attention to.

Reviewer #1 (Recommendations for the authors):

I enjoyed reading the paper. Here are a few points that could help clarifying your findings:

– It would be nice to validate some of the TIPS using either classical PCR or long read data.

We entirely agree. Following up from the response above, we have used PacBio long-read data generated on an overlapping set of isolates to validate specifically singleton TEs. These are potentially the most problematic ones. In a subset of seven isolates for which we also have high-coverage PacBio long read sequences, we could confirm ~70% of singleton TEs. We describe this now in methods/results and supplementaries. Furthermore, we caution in the discussion how errors in singleton detection may affect a specific subset of our findings.

– Most families have relatively low copy numbers (Figure S6). Is the increase in copy number sufficient to explain the genome size increase (given that a TE might be generally smaller than 10 kb on average)?

This is an excellent suggestion. We have now estimated the expected sequence length added by the inserted TEs in populations with enlarged genomes and added Figure 7E to compare increased genome size and cumulative length of TEs. We added cumulative TE length to the Spearman correlation matrix as well. We find that these sequences explain ~0.19-2.55% of the genome size increase. We added an additional figure supplement to show the estimation of cumulated TE length and amount of explanation of TE length on genome size expansion. It is important to note that we present only a subset of the totality of the TE insertions due to the fact that we surely are unable to capture all TE insertions with our approach. The stringent quality filtering additionally contributes to this underrepresentation. We think that TE-mediated rearrangements and duplications could additionally explain the gap between the reported TE increases and the estimated genome size increase. We discuss these aspects now more thoroughly.

– It could be interesting to highlight what figures are based on core chromosomes and what figures include all chromosomes, as for genomicists not familiar with fungal genomes this distinction would not be evident. This is stated on line 306 but could be explicit for each figure.

We make this now more explicit as suggested by renaming “genome size” to “core genome size”.

– Line 283: The authors have already published the complete genomes, could they be more specific on the difference between the two datasets? Is the 2020 paper focused on one "reference" genome per geographic origin?

We now more explicitly explain the geographic breadth of each dataset when we mention the pangenome publication. Yes, the 2020 publication established single reference genomes (total n=19) for different regions without accounting for within-region diversity.

– Line 352: Is MFS1 corresponding to the MFS gene on Figure 4B? If so, what is the MSF1 expression level in the US genomes (or in the TE-present versus TE-absent individuals?)

Made now clearer.

– Figure 1D ii is not clear to me.

We modified the figure and improved the legend.

– Figures S6 and S12 are very interesting and could be moved to main figures (if no space constraint).

We are happy to do so. We now present S6 as part of main figure 6 and parts of S12 as part of main figure 4.

– It would be interesting to see Figure 3a with the color-code for RLG vs RLC and DHH TE classes.

We now provide a new figure supplement using a clearer color code to distinguish these specific superfamilies.

– It seems that the 5 selected candidate adaptive TEs (line 204) are not comprised in the 5 ones found in selective sweep regions (line 190). If so, could the 5 ones on selective sweep regions be cited in the text maybe?

We now provide the missing information. We ran the selective sweep analysis on iHS and XP-EHH separately. We changed the filtering order: (a) first filter for all TEs that are located in a region of selective sweep, (b) filter elements with high allele frequency increase in US/Oregon or Europe (c) then order by global pairwise F_ST. This led to us discussing a different candidate in the main texts.

–An example of TE burst within Oryza could be cited (Piegu et al., 2006, see Line 74).

Thank you. Citation added.

Reviewer #2 (Recommendations for the authors):

While the type of dataset that the authors have can lead to interesting hypothesis testing for TE dynamics, the manuscript lacked clarity between the hypothesis to be tested, the results, and the link to the figures, which makes it unfortunately difficult to measure the impact of the findings.

1. The fungicide assay is used to test for potential TE insertions conferring fungicide resistance. The authors do not mention which insertion loci are being tested and the legends in the supplementary Figure 12 does not provide details. Counting 26 loci helped to then make the probable link with the 26 candidate loci with local adaptation in the North American populations, but then the link with the five TE insertion in proximity to genes (line 204) is not clear. Are the 3 significant loci part of the five described in Figure 4 and supplemental Figures 8-11?

We apologize for the lack of clarity here. We now make the links between the different observation more explicit. We specifically mention how the datasets presented in the different figures are linked. According to reviewer #1 suggestions we moved the supplementary figure content to main Figure 5.

How do the authors reconcile these results with the fact that the Helitron insertion highlighted in Figure 4 is not significant in the fungicide assay (by trying to read the insertion names in Supp. Figure S12).

We agree that there is not enough clarity to link the different methods to find candidate TEs. We changed the filtering methods for candidates with a stronger focus onTE in regions of selective sweep (see above).

Finally, the Methods section lacks key details as to the number of replicates for the assay and statistical analyses.

We have now improved the methods section with the missing information.

2.The authors looked at the frequency spectrum of TEs in the 6 populations to see if they correlated with potential population-specific expansions. This would provide a good indicator about the type of selection acting on these populations. The panel 6D does not support the conclusions. It is hard to understand the bar graph and it is unclear to the reader how the authors are using the frequency bins to conclude the shift. Based on the changes of the color bars, the results for the North American population do not seem to be supported. Unless there was an issue with the labeling of the figure, or that additional information in the legends should be provided to make the results stand out.

We are grateful for these suggestions. We have now improved the visualization in the panel (now 6D) using fitted curves and pairwise comparison with Kolmogorov-Smirnov. We also make the legend more complete and discuss more coherently what the differences in frequency spectra suggests.

3. I may not have understood, but the authors found five loci overlapping selective sweep regions and mention that they are retrotransposons. Then the story changes its focus on 5 insertions loci in proximity to genes associated with fungicide resistance or host adaptation. It takes a good part of the paragraph to understand which insertion refers to the ones overlapping the sweep. I would have liked a more detailed description of these selective sweep region, especially regarding the regions not being associated with the fungicide resistance genes.

We have now supplemented the Results with a summary of what was found in the selective sweep regions. We also more clearly refer to how different TE insertions overlap with selective sweep regions.

4. The authors mention migration bottleneck, which I assume they refer to population bottleneck caused by migration. Would it be possible to provide references to support this assumption?

This was indeed insufficiently explained. We now refer to previous studies on global colonization history, recent gene flow and population-level diversity. We then explain more clearly how some populations have undergone a population bottleneck.

5. I would suggest that the authors carefully review the figures and the corresponding legends. For example:

– Figure 2. Modify order so that the panels are called in the proper order. In A, green triangle not defined, 2C, hard to see the red vs grey line, 2E hard to distinguish among the colors, like RIX and RLG gypsy and provide legends for all dot sizes, 2F: no hexagon, do we mean arrow?

Errors corrected, graphical improvements made as suggested and other figures are reviewed as well.

– Figure 2F: not clear where the singleton are indicated or not, but called for in the manuscript. What does "more than two copies in the populations" mean? I would assume it is two or more (not more than two) but it is not clear whether it is more than two copies per population or at least one copy in multiple populations.

We agree with the reviewer that this figure is not too clear. In the original figure singletons and TE loci with only two occurrences were not included. We changed the figure to a density plot and clarified the wording in the text and legend.

– Figure 3. Dots present next to some of the chromosome numbers and it is not explained in the legends why.

We removed the dots. We initially intended to help distinguish the numbers 6 and 9.

– Line 179 : Was it intended to say European Population or really the Australian ? If so, justify why.

We thank the reviewers for this remark and changed the text accordingly.

– Some of the Figure panels are not called in the main text (only in methods). For example only Figure 1A is mentioned in the text. I would like to see the panels B,C,D being integrated in the text to clarify how the pipeline works and the validation.

We have now checked all figures/panels, so that we mention each sequentially in the Results. Figure 1B-D were mentioned in the methods before.

– Figure 4G in its current state does not support the conclusion that it is only found in North American populations as only the Middle-East population is compared, which was probably chosen since it represents the origin of the pathogen.

We added more reference genomes to the synteny plot. While DHH_Ada is not present in this location in any other isolate, the Australian isolate shows at a similar locus a high density of several nested TE insertions

Reviewer #3 (Recommendations for the authors):

Line 166-171. I do not fully understand the analysis of the data presented in Figure 2A. It is not clear from the results and methods section how the expected distribution of TE insertions was obtained. In addition, it seems that TE insertions located very close of two genes may be counted twice, introducing potential biases in the analysis (i.e. insertions in gene-rich and gene-poor regions are counted twice or once, respectively).

We have now clarified in the legend and methods how we avoid double-counting and biases. This is now Figure 2F.

Independently of how this analysis was performed, I disagree with your interpretation of the results based only on selective forces, as the localization of TE insertions is the results of integration preferences and subsequent selective biases. For instance, it has been established that COPIA and MuDR transposons integrate preferentially within or upstream genes in multiple species. The effect of integration preference is particularly important in this article, as most TE insertion polymorphisms detected singletons that are expected to be the result of very recent transposition events unfiltered by natural selection.

Our sentences were poorly worded. We fully agree that selection is not the only factor here. We have now revised the wording and cite some relevant studies showing the breadth of scenarios.

Line 205-213 The increased frequency of the helitron insertion at MFS could be the consequence of either positive selection of the TE insertion, a linked polymorphism (genetic hitchhiking) or else background selection on the haplotype. Better evidence for positive selection on the TE insertion itself could be obtained by scanning the region for selective sweeps based on SNPs and polymorphic TEs and using site frequency spectrum-based (E.g Fst) and/or haplotype length-based (e.g. iHS). Selective sweeps signatures centered on the TE insertion would support selection on this variant.

This is an excellent suggestion. The different populations were in fact already scanned for iHS and XP-EHH based signatures. We now more explicitly state that this has been done and report the findings in for relevant TE insertion loci.

Line 225-227. The association between TE presence and azole resistance is tantalizing, but cannot imply causality. As pointed before, the TE insertions may be simply hitchhiking a causal mutation. Moreover, I cannot find in the article any result supporting that the TE insertions associated with azole resistance also show signatures of positive selection. Indeed, it seems that none of the insertion discussed at length in Lines 205-223 show associations in Suppl. Figure 12. This is a key point of the article because without this connection it is difficult to draw any conclusion about potential benefits associated with TE insertions in Zymoseptoria tritici.

Yes, we entirely agree that our manuscript does not provide conclusive evidence for adaptive functions of TE insertions. This was not our intention, and we regret the lack of clarity in our presentation. Evidence for adaptive TE insertions in the species was presented in different manuscripts but not here. We now more carefully discuss TE insertions which we consider candidates for having contributed adaptive variation and explicitly consider overlaps with selective sweeps.

Line 354-357. As I mentioned above, I don't see any correspondence between signatures of positive selection and azole resistance for the Gypsy (RLG_Sol) insertion near the MFS gene in Chr 12 (Figure S10). Indeed, this insertion is not marked as statistically significant in Figure S11.

We now more clearly mention this in the text.

Figure 6D. The reduction observed in the fraction of loci with low allele frequency (10%) in North American isolates is not accompanied by an increase in the proportion of loci at high allele frequency. In other words, it seems as if the bars do not sum up 1. Please check/clarify this figure.

We have re-checked the data presented and improved the visualization.

Work in Arabidopsis demonstrated that intraspecific genome size variation is largely explained by copy number variation in 45S rRNA (Long et al. 2013 Nat. Genet). This work should be discussed in relation to your observations, particularly given that the de novo assembly of genomes using short-reads is unable to estimate copy number variation of rRNA or centromeres.

Thank for this helpful context of genome size variation. We now mention this in the Discussion as requested.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Hartmann FE, McDonald BA, Croll D. 2019. Population sequencing of Zymoseptoria tritici in Switzerland and Oregon (USA) NCBI BioProject. PRJNA596434
Hartmann FE, McDonald BA, Croll D. 2016. Population genomics of Zymoseptoria tritici. NCBI BioProject. PRJNA327615
Croll D, McDonald BA. 2012. Zymoseptoria tritici genome sequencing. NCBI BioProject. PRJNA178194

Supplementary Materials

Figure 1—source data 1. TE insertion validations for non-reference copies.

elife-69249-fig1-data1.txt^{(2MB, txt)}

Figure 1—source data 2. TE consensus sequences.

elife-69249-fig1-data2.txt^{(205.6KB, txt)}

Figure 2—source data 1. TEs in reference.

elife-69249-fig2-data1.txt^{(1.5KB, txt)}

Figure 2—source data 2. Presence absence matrix TE loci.

elife-69249-fig2-data2.txt^{(87.9MB, txt)}

Figure 2—source data 3. Singletons.

elife-69249-fig2-data3.txt^{(2.9KB, txt)}

Figure 4—source data 1. Top loci information.

elife-69249-fig4-data1.txt^{(3.1KB, txt)}

Figure 5—source data 1. Isolates.

elife-69249-fig5-data1.txt^{(18.3KB, txt)}

Figure 6—source data 1. Kolmogorof–Smirnov.

elife-69249-fig6-data1.txt^{(838B, txt)}

Transparent reporting form

elife-69249-transrepform1.docx^{(246.4KB, docx)}

Data Availability Statement

The following dataset was generated:

Hartmann FE, McDonald BA, Croll D. 2019. Population sequencing of Zymoseptoria tritici in Switzerland and Oregon (USA) NCBI BioProject. PRJNA596434

The following previously published datasets were used:

Hartmann FE, McDonald BA, Croll D. 2016. Population genomics of Zymoseptoria tritici. NCBI BioProject. PRJNA327615

Croll D, McDonald BA. 2012. Zymoseptoria tritici genome sequencing. NCBI BioProject. PRJNA178194

[bib1] Ali SS, Khan M, Mullins E, Doohan FM. Identification of fusarium oxysporum genes associated with lignocellulose bioconversion competency. Bioenergy Research. 2013;7:110–119. doi: 10.1007/s12155-013-9353-0. [DOI] [Google Scholar]

[bib2] Badet T, Oggenfuss U, Abraham L, McDonald BA, Croll D. A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici. BMC Biology. 2020;18:12. doi: 10.1186/s12915-020-0744-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Badet T, Fouché S, Hartmann FE, Zala M, Croll D. Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen. Nature Communications. 2021;12:3551. doi: 10.1038/s41467-021-23862-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. Spades: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Bao W, Kojima KK, Kohany O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, Marth GT. Bamtools: A C++ API and Toolkit for Analyzing and Managing BAM Files. Bioinformatics. 2011;27:1691–1692. doi: 10.1093/bioinformatics/btr174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Baucom RS, Estill JC, Leebens-Mack J, Bennetzen JL. Natural selection on gene function drives the evolution of LTR retrotransposon families in the rice genome. Genome Research. 2009;19:243–254. doi: 10.1101/gr.083360.108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Benjamini Y, Speed TP. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Research. 2012;40:1–14. doi: 10.1093/nar/gks001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Blommaert J. Genome size evolution: Towards new model systems for old questions. Proceedings. Biological Sciences. 2020;287:20201441. doi: 10.1098/rspb.2020.1441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Bonifacino JS, Glick BS. The mechanisms of vesicle budding and fusion. Cell. 2004;116:153–166. doi: 10.1016/s0092-8674(03)01079-1. [DOI] [PubMed] [Google Scholar]

[bib12] Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, Imbeault M, Izsvák Z, Levin HL, Macfarlan TS, Mager DL, Feschotte C. Ten things you should know about transposable elements. Genome Biology. 2018;19:199. doi: 10.1186/s13059-018-1577-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation plink: Rising to the challenge of larger and richer datasets. GigaScience. 2015;4:1–16. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Charlesworth B, Charlesworth D. The population dynamics of transposable elements. Genetical Research. 2009;42:1–27. doi: 10.1017/S0016672300021455. [DOI] [Google Scholar]

[bib15] Chuong EB, Elde NC, Feschotte C. Regulatory activities of transposable elements: From conflicts to benefits. Nature Reviews. Genetics. 2017;18:71–86. doi: 10.1038/nrg.2016.139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Cridland JM, Macdonald SJ, Long AD, Thornton KR. Abundance and distribution of transposable elements in two Drosophila QTL mapping resources. Molecular Biology and Evolution. 2013;30:2311–2327. doi: 10.1093/molbev/mst129. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Croll D, Zala M, McDonald BA. Breakage-fusion-bridge cycles and large insertions contribute to the rapid evolution of accessory chromosomes in a fungal pathogen. PLOS Genetics. 2013;9:e1003567. doi: 10.1371/journal.pgen.1003567. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Croll D, Lendenmann MH, Stewart E, McDonald BA. The impact of recombination hotspots on genome evolution of a fungal plant pathogen. Genetics. 2015;201:1213–1228. doi: 10.1534/genetics.115.180968. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Darriba D, Taboada GL, Doallo R, Posada D. Jmodeltest 2: More models, new heuristics and parallel computing. Nature Methods. 2012;9:772. doi: 10.1038/nmeth.2109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Star: Ultrafast Universal RNA-SEQ aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Eichler EE, Sankoff D. Structural dynamics of eukaryotic chromosome evolution. Science. 2003;301:793–797. doi: 10.1126/science.1086132. [DOI] [PubMed] [Google Scholar]

[bib22] Estep LK, Torriani SFF, Zala M, Anderson NP, Flowers MD, McDonald BA, Mundt CC, Brunner PC. Emergence and early evolution of fungicide resistance in North American populations of Zymoseptoria tritici. Plant Pathology. 2015;64:961–971. doi: 10.1111/ppa.12314. [DOI] [Google Scholar]

[bib23] Feschotte C. Transposable elements and the evolution of regulatory networks. Nature Reviews Genetics. 2008;9:397–405. doi: 10.1038/nrg2337. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Feurtey A, Lorrain C, Croll D, Eschenbrenner C, Freitag M, Habig M, Haueisen J, Möller M, Schotanus K, Stukenbrock EH. Genome compartmentalization predates species divergence in the plant pathogen genus Zymoseptoria. BMC Genomics. 2020;21:588. doi: 10.1186/s12864-020-06871-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Fouché S, Badet T, Oggenfuss U, Plissonneau C, Francisco CS, Croll D. Stress-driven transposable element de-repression dynamics in a fungal pathogen. Molecular Biology and Evolution. 2019;37:msz216. doi: 10.1093/molbev/msz216. [DOI] [PubMed] [Google Scholar]

[bib26] Frantzeskakis L, Kracher B, Kusch S, Yoshikawa-Maekawa M, Bauer S, Pedersen C, Spanu PD, Maekawa T, Schulze-Lefert P, Panstruga R. Signatures of host specialization and a recent transposable element burst in the dynamic one-speed genome of the fungal barley powdery mildew pathogen. BMC Genomics. 2018;19:1–23. doi: 10.1186/s12864-018-4750-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Fu Y, Kawabe A, Etcheverry M, Ito T, Toyoda A, Fujiyama A, Colot V, Tarutani Y, Kakutani T. Mobilization of a plant transposon by expression of the transposon-encoded anti-silencing factor. The EMBO Journal. 2013;32:2407–2417. doi: 10.1038/emboj.2013.169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Gautier M, Vitalis R. rehh: an R package to detect footprints of selection in genome-wide SNP data from haplotype structure. Bioinformatics. 2012;28:1176–1177. doi: 10.1093/bioinformatics/bts115. [DOI] [PubMed] [Google Scholar]

[bib29] Gilly A, Etcheverry M, Madoui M-A, Guy J, Quadrana L, Alberti A, Martin A, Heitkam T, Engelen S, Labadie K, Le Pen J, Wincker P, Colot V, Aury J-M. Te-tracker: Systematic identification of transposition events through whole-genome resequencing. BMC Bioinformatics. 2014;15:377. doi: 10.1186/s12859-014-0377-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] González-Sayer S, Oggenfuss U, García I, Aristizabal F. High-quality genome assembly of pseudocercospora ulei the main threat to natural rubber trees. Genetics and Molecular Biology. 2021 doi: 10.1590/1678-4685-GMB-2021-0051. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Goodwin SB, M’barek SB, Dhillon B, Wittenberg AHJ, Crane CF, Hane JK, Foster AJ, Van der Lee TAJ, Grimwood J, Aerts A, Antoniw J, Bailey A, Bluhm B, Bowler J, Bristow J, van der Burgt A, Canto-Canché B, Churchill ACL, Conde-Ferràez L, Cools HJ, Coutinho PM, Csukai M, Dehal P, De Wit P, Donzelli B, van de Geest HC, van Ham RCHJ, Hammond-Kosack KE, Henrissat B, Kilian A, Kobayashi AK, Koopmann E, Kourmpetis Y, Kuzniar A, Lindquist E, Lombard V, Maliepaard C, Martins N, Mehrabi R, Nap JPH, Ponomarenko A, Rudd JJ, Salamov A, Schmutz J, Schouten HJ, Shapiro H, Stergiopoulos I, Torriani SFF, Tu H, de Vries RP, Waalwijk C, Ware SB, Wiebenga A, Zwiers L-H, Oliver RP, Grigoriev IV, Kema GHJ. Finished genome of the fungal wheat pathogen Mycosphaerella graminicola reveals dispensome structure, chromosome plasticity, and stealth pathogenesis. PLOS Genetics. 2011;7:e1002070. doi: 10.1371/journal.pgen.1002070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Goudet J. HierFSTAT, a package for R to compute and test hierarchical f-statistics. Molecular Ecology Notes. 2005;5:184–186. doi: 10.1111/j.1471-8286.2004.00828.x. [DOI] [Google Scholar]

[bib33] Grandaubert J, Bhattacharyya A, Stukenbrock EH. RNA-seq-based gene annotation and comparative genomics of four fungal grass pathogens in the genus zymoseptoria identify novel orphan genes and species-specific invasions of transposable elements. G3: Genes, Genomes, Genetics. 2015;5:1323–1333. doi: 10.1534/g3.115.017731. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology. 2003;52:696–704. doi: 10.1080/10635150390235520. [DOI] [PubMed] [Google Scholar]

[bib35] Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of Phyml 3.0. Systematic Biology. 2010;59:307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]

[bib36] Guy L, Kultima JR, Andersson SGE. GENOPLOTR: Comparative gene and genome Visualization in R. Bioinformatics. 2010;26:2334–2335. doi: 10.1093/bioinformatics/btq413. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Hartmann FE, Croll D. Distinct trajectories of massive recent gene gains and losses in populations of a microbial eukaryotic pathogen. Molecular Biology and Evolution. 2017;34:2808–2822. doi: 10.1093/molbev/msx208. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Hartmann FE, Sánchez-Vallet A, McDonald BA, Croll D. A fungal wheat pathogen evolved host specialization by extensive chromosomal rearrangements. The ISME Journal. 2017;11:1189–1204. doi: 10.1038/ismej.2016.196. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Hartmann FE, McDonald BA, Croll D. Genome-wide evidence for divergent selection between populations of a major agricultural pathogen. Molecular Ecology. 2018;27:2725–2741. doi: 10.1111/mec.14711. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Hartmann FE, Vonlanthen T, Singh NK, McDonald MC, Milgate A, Croll D. The Complex Genomic Basis of Rapid Convergent Adaptation to Pesticides across Continents in a Fungal Plant Pathogen. Molecular Ecology; 2020. [DOI] [PubMed] [Google Scholar]

[bib41] Heng L. SEQTK. 7c04ce7Github. 2021 https://github.com/lh3/seqtk/

[bib42] Hollister JD, Gaut BS. Epigenetic silencing of transposable elements: A trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Research. 2009;19:1419–1428. doi: 10.1101/gr.091678.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R. Dendroscope: An interactive viewer for large phylogenetic trees. BMC Bioinformatics. 2007;8:1–6. doi: 10.1186/1471-2105-8-460. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Jiao WB, Schneeberger K. Chromosome-Level Assemblies of Multiple Arabidopsis thaliana Accessions Reveal Hotspots of Genomic Rearrangements. bioRxiv. 2019 doi: 10.1101/738880. [DOI] [PMC free article] [PubMed]

[bib45] Jombart T. Adegenet: A r package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24:1403–1405. doi: 10.1093/bioinformatics/btn129. [DOI] [PubMed] [Google Scholar]

[bib46] Jombart T, Ahmed I. ADEGENET 1.3-1: New tools for the analysis of genome-wide SNP data. Bioinformatics. 2011;27:3070–3071. doi: 10.1093/bioinformatics/btr521. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Kidwell MG. Transposable elements and the evolution of genome size in eukaryotes. Genetica. 2002;115:49–63. doi: 10.1023/a:1016072014259. [DOI] [PubMed] [Google Scholar]

[bib49] Krishnan P, Meile L, Plissonneau C, Ma X, Hartmann FE, Croll D, McDonald BA, Sánchez-Vallet A. Transposable element insertions shape gene regulation and melanin production in a fungal pathogen of wheat. BMC Biology. 2018;16:1–18. doi: 10.1186/s12915-018-0543-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Laboratory of Evolutionary Genetics @ UNINE Datasets. swh:1:rev:3647456a2f7ed986b690501d690f09d02d3f586eSoftware Heritage. 2021 https://archive.softwareheritage.org/swh:1:rev:3647456a2f7ed986b690501d690f09d02d3f586e

[bib51] Lai X, Schnable JC, Liao Z, Xu J, Zhang G, Li C, Hu E, Rong T, Xu Y, Lu Y. Genome-wide characterization of non-reference transposable element insertion polymorphisms reveals genetic diversity in tropical and temperate maize. BMC Genomics. 2017;18:1–13. doi: 10.1186/s12864-017-4103-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan MT, Carey VJ. Software for computing and annotating genomic ranges. PLOS Computational Biology. 2013;9:e1003118. doi: 10.1371/journal.pcbi.1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Li H, Durbin R. Fast and accurate short read alignment with Burrows-wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The sequence Alignment/map format and samtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] Liang X, Wang B, Dong Q, Li L, Rollins JA, Zhang R, Sun G. Pathogenic adaptations of colletotrichum fungi revealed by genome wide gene family evolutionary analyses. PLOS ONE. 2018;13:e0196303. doi: 10.1371/journal.pone.0196303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Lim JK. Intrachromosomal rearrangements mediated by hobo transposons in Drosophila melanogaster. PNAS. 1988;85:9153–9157. doi: 10.1073/pnas.85.23.9153. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Linde CC, Zhan J, McDonald BA. Population Structure of Mycosphaerella graminicola: From Lesions to Continents. Phytopathology. 2002;92:946–955. doi: 10.1094/PHYTO.2002.92.9.946. [DOI] [PubMed] [Google Scholar]

[bib59] Linheiro RS, Bergman CM. Whole genome resequencing reveals natural target site preferences of transposable elements in Drosophila melanogaster. PLOS ONE. 2012;7:e30008. doi: 10.1371/journal.pone.0030008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A, Zhang Q, Vilhjálmsson BJ, Korte A, Nizhynska V, Voronin V, Korte P, Sedman L, Mandáková T, Lysak MA, Seren Ü, Hellmann I, Nordborg M. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nature Genetics. 2013;45:884–890. doi: 10.1038/ng.2678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Lu L, Chen J, Robb SMC, Okumoto Y, Stajich JE, Wessler SR. Tracking the genome-wide outcomes of a transposable element burst over decades of amplification. PNAS. 2017;114:E10550–E10559. doi: 10.1073/pnas.1716459114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Luo S, Zhang H, Duan Y, Yao X, Clark AG, Lu J. The evolutionary arms race between transposable elements and pirnas in Drosophila melanogaster. BMC Evolutionary Biology. 2020;20:14. doi: 10.1186/s12862-020-1580-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Lynch M. The Origins of Genome Architecture. Sunderland MA: Sinauer Associates; 2007. [Google Scholar]

[bib64] McDonald BA, Mundt CC, Chen R-S. The role of selection on the genetic structure of pathogen populations: Evidence from field experiments with Mycosphaerella graminicola on wheat. Euphytica. 1996;92:73–80. doi: 10.1007/BF00022831. [DOI] [Google Scholar]

[bib65] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis Toolkit: A mapreduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Meile L, Croll D, Brunner PC, Plissonneau C, Hartmann FE, McDonald BA, Sánchez-Vallet A. A fungal avirulence factor encoded in a highly plastic genomic region triggers partial resistance to septoria tritici blotch. The New Phytologist. 2018;219:1048–1061. doi: 10.1111/nph.15180. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, Brooks S, Howe E, Porubsky D, Logsdon GA, Schneider VA, Potapova T, Wood J, Chow W, Armstrong J, Fredrickson J, Pak E, Tigyi K, Kremitzki M, Markovic C, Maduro V, Dutra A, Bouffard GG, Chang AM, Hansen NF, Wilfert AB, Thibaud-Nissen F, Schmitt AD, Belton J-M, Selvaraj S, Dennis MY, Soto DC, Sahasrabudhe R, Kaya G, Quick J, Loman NJ, Holmes N, Loose M, Surti U, Risques RA, Graves Lindsay TA, Fulton R, Hall I, Paten B, Howe K, Timp W, Young A, Mullikin JC, Pevzner PA, Gerton JL, Sullivan BA, Eichler EE, Phillippy AM. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020;585:79–84. doi: 10.1038/s41586-020-2547-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018;34:i142–i150. doi: 10.1093/bioinformatics/bty266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib69] Miyao A, Tanaka K, Murata K, Sawaki H, Takeda S, Abe K, Shinozuka Y, Onosato K, Hirochika H. Target site specificity of the tos17 retrotransposon shows a preference for insertion within genes and against insertion in retrotransposon-rich regions of the genome. The Plant Cell. 2003;15:1771–1780. doi: 10.1105/tpc.012559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib70] Nagarajan N, Pop M. Sequence assembly demystified. Nature Reviews. Genetics. 2013;14:157–167. doi: 10.1038/nrg3367. [DOI] [PubMed] [Google Scholar]

[bib71] Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H, Altaf-Ul-Amin M, Ogasawara N, Kanaya S. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Research. 2011;39:e90. doi: 10.1093/nar/gkr344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib72] Nelson MG, Linheiro RS, Bergman CM. Mcclintock: An integrated pipeline for detecting transposable element insertions in whole-genome shotgun sequencing data. G3: Genes, Genomes, Genetics. 2017;7:2763–2778. doi: 10.1534/g3.117.043893. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib73] Oliver KR, McComb JA, Greene WK. Transposable elements: Powerful contributors to angiosperm evolution and diversity. Genome Biology and Evolution. 2013;5:1886–1901. doi: 10.1093/gbe/evt141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib74] Omrane S, Sghyer H, Audéon C, Lanen C, Duplaix C, Walker A-S, Fillinger S. Fungicide efflux and the mgmfs1 transporter contribute to the multidrug resistance phenotype in Zymoseptoria tritici Field isolates. Environmental Microbiology. 2015;17:2805–2823. doi: 10.1111/1462-2920.12781. [DOI] [PubMed] [Google Scholar]

[bib75] Omrane S, Audéon C, Ignace A, Duplaix C, Aouini L, Kema G, Walker AS, Fillinger S. Plasticity of the MFS1 Promoter Leads to Multidrug Resistance in the Wheat Pathogen Zymoseptoria tritici. MSphere. 2017;2:1–42. doi: 10.1128/mSphere.00393-17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib76] Peter M, Kohler A, Ohm RA, Kuo A, Krützmann J, Morin E, Arend M, Barry KW, Binder M, Choi C, Clum A, Copeland A, Grisel N, Haridas S, Kipfer T, LaButti K, Lindquist E, Lipzen A, Maire R, Meier B, Mihaltcheva S, Molinier V, Murat C, Pöggeler S, Quandt CA, Sperisen C, Tritt A, Tisserant E, Crous PW, Henrissat B, Nehls U, Egli S, Spatafora JW, Grigoriev IV, Martin FM. Ectomycorrhizal ecology is imprinted in the genome of the dominant symbiotic fungus cenococcum Geophilum. Nature Communications. 2016;7:1–15. doi: 10.1038/ncomms12662. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib77] Petrov DA, Aminetzach YT, Davis JC, Bensasson D, Hirsh AE. Size matters: Non-ltr retrotransposable elements and ectopic recombination in Drosophila. Molecular Biology and Evolution. 2003;20:880–892. doi: 10.1093/molbev/msg102. [DOI] [PubMed] [Google Scholar]

[bib78] Piegu B, Guyot R, Picault N, Roulin A, Sanyal A, Saniyal A, Kim H, Collura K, Brar DS, Jackson S, Wing RA, Panaud O. Doubling genome size without polyploidization: Dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Research. 2006;16:1262–1269. doi: 10.1101/gr.5290206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib79] Plissonneau C, Stürchler A, Croll D. The evolution of orphan regions in genomes of a fungal pathogen of wheat. MBio. 2016;7:e01231-16. doi: 10.1128/mBio.01231-16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib80] Quadrana L, Bortolini Silveira A, Mayhew GF, LeBlanc C, Martienssen RA, Jeddeloh JA, Colot V. The Arabidopsis thaliana mobilome and its impact at the species level. eLife. 2016;5:e15716. doi: 10.7554/eLife.15716. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib81] Quinlan AR, Hall IM. Bedtools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] R Development Core Team . Vienna, Austria: R Foundation for Statistical Computing; 2017. http://www.r-project.org [Google Scholar]

[bib83] Raffaele S, Kamoun S. Genome evolution in filamentous plant pathogens: Why bigger can be better. Nature Reviews. Microbiology. 2012;10:417–430. doi: 10.1038/nrmicro2790. [DOI] [PubMed] [Google Scholar]

[bib84] Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology open Software Suite. Trends in Genetics. 2000;16:276–277. doi: 10.1016/s0168-9525(00)02024-2. [DOI] [PubMed] [Google Scholar]

[bib85] Ritz C, Streibig JC. Bioassay analysis using R. Journal of Statistical Software. 2005;12:1–22. doi: 10.18637/jss.v012.i05. [DOI] [Google Scholar]

[bib86] Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative Genome Viewer. Nature Biotechnology. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] Rogers PD, Barker KS. Genome-wide expression profile analysis reveals coordinately regulated genes associated with stepwise acquisition of azole resistance in Candida albicans clinical isolates. Antimicrobial Agents and Chemotherapy. 2003;47:1220–1227. doi: 10.1128/AAC.47.4.1220-1227.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib88] Rogers J, Gibbs RA. Comparative primate genomics: emerging patterns of genome content and dynamics. Nature Reviews. Genetics. 2014;15:347–359. doi: 10.1038/nrg3707. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib89] Rouxel T, Grandaubert J, Hane JK, Hoede C, van de Wouw AP, Couloux A, Dominguez V, Anthouard V, Bally P, Bourras S, Cozijnsen AJ, Ciuffetti LM, Degrave A, Dilmaghani A, Duret L, Fudal I, Goodwin SB, Gout L, Glaser N, Linglin J, Kema GHJ, Lapalu N, Lawrence CB, May K, Meyer M, Ollivier B, Poulain J, Schoch CL, Simon A, Spatafora JW, Stachowiak A, Turgeon BG, Tyler BM, Vincent D, Weissenbach J, Amselem J, Quesneville H, Oliver RP, Wincker P, Balesdent M-H, Howlett BJ. Effector diversification within compartments of the Leptosphaeria maculans genome affected by repeat-induced point mutations. Nature Communications. 2011;2:202. doi: 10.1038/ncomms1189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib91] Sachin P. RNA-Seq. 3e1c4b1GitHub. 2021 https://github.com/spundhir/RNA-Seq/blob/main/get_gc_content.pl

[bib92] SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nature Genetics. 1998;20:43–45. doi: 10.1038/1695. [DOI] [PubMed] [Google Scholar]

[bib93] Shen MR, Batzer MA, Deininger PL. Evolution of the master Alu gene(s. Journal of Molecular Evolution. 1991;33:311–320. doi: 10.1007/BF02102862. [DOI] [PubMed] [Google Scholar]

[bib94] Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[bib95] Singh NK, Chanclud E, Croll D. Population-Level deep sequencing reveals the interplay of clonal and sexual reproduction in the fungal wheat pathogen Zymoseptoria tritici. bioRxiv. 2020 doi: 10.1101/2020.07.07.191510. [DOI] [PMC free article] [PubMed]

[bib96] Sirisattha S, Momose Y, Kitagawa E, Iwahashi H. Toxicity of anionic detergents determined by Saccharomyces cerevisiae microarray analysis. Water Research. 2004;38:61–70. doi: 10.1016/j.watres.2003.08.027. [DOI] [PubMed] [Google Scholar]

[bib97] Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nature Reviews. Genetics. 2007;8:272–285. doi: 10.1038/nrg2072. [DOI] [PubMed] [Google Scholar]

[bib98] Soustre I, Letourneux Y, Karst F. Characterization of the Saccharomyces cerevisiae RTA1 gene involved in 7-aminocholesterol resistance. Current Genetics. 1996;30:121–125. doi: 10.1007/s002940050110. [DOI] [PubMed] [Google Scholar]

[bib99] Stöver BC, Müller KF. Treegraph 2: Combining and visualizing evidence from different phylogenetic analyses. BMC Bioinformatics. 2010;11:1–9. doi: 10.1186/1471-2105-11-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib100] Stritt C, Gordon SP, Wicker T, Vogel JP, Roulin AC. Recent activity in expanding populations and purifying selection have shaped transposable element landscapes across natural accessions of the Mediterranean grass brachypodium distachyon. Genome Biology and Evolution. 2018;10:304–318. doi: 10.1093/gbe/evx276. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib101] Stuart T, Eichten SR, Cahn J, Karpievitch YV, Borevitz JO, Lister R. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. eLife. 2016;5:e20777. doi: 10.7554/eLife.20777. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib102] Stukenbrock EH, Banke S, Javan-Nikkhah M, McDonald BA. Origin and domestication of the fungal wheat pathogen Mycosphaerella graminicola via sympatric speciation. Molecular Biology and Evolution. 2007;24:398–411. doi: 10.1093/molbev/msl169. [DOI] [PubMed] [Google Scholar]

[bib103] Stukenbrock EH, Bataillon T, Dutheil JY, Hansen TT, Li R, Zala M, McDonald BA, Wang J, Schierup MH. The making of a new pathogen: Insights from comparative population genomics of the domesticated wheat pathogen Mycosphaerella graminicola and its wild sister species. Genome Research. 2011;21:2157–2166. doi: 10.1101/gr.118851.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib104] Torriani SFF, Melichar JPE, Mills C, Pain N, Sierotzki H, Courbot M. Zymoseptoria tritici: A major threat to wheat production, integrated approaches to control. Fungal Genetics and Biology. 2015;79:8–12. doi: 10.1016/j.fgb.2015.04.010. [DOI] [PubMed] [Google Scholar]

[bib105] Walser JC, Chen B, Feder ME. Heat-shock promoters: Targets for evolution by P transposable elements in Drosophila. PLOS Genetics. 2006;2:e165. doi: 10.1371/journal.pgen.0020165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib106] Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–1191. doi: 10.1093/bioinformatics/btp033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib107] Wickham H. Ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016. [DOI] [Google Scholar]

[bib108] Wickham H, Chang W. Devtools; 2016. https://CRAN.R-project.org/package=devtools [Google Scholar]

[bib109] Wong WY, Simakov O, Bridge DM, Cartwright P, Bellantuono AJ, Kuhn A, Holstein TW, David CN, Steele RE, Martínez DE. Expansion of a single transposable element family is associated with genome-size increase and radiation in the genus Hydra. PNAS. 2019;116:22915–22917. doi: 10.1073/pnas.1910106116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib110] Yew SM, Chan CL, Kuan CS, Toh YF, Ngeow YF, Na SL, Lee KW, Hoh CC, Yee WY, Ng KP. The genome of newly classified Ochroconis mirabilis: Insights into fungal adaptation to different living conditions. BMC Genomics. 2016;17:1–17. doi: 10.1186/s12864-016-2409-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib111] Zhan J, Kema GHJ, Waalwijk C, McDonald BA. Distribution of mating type alleles in the wheat pathogen Mycosphaerella graminicola over spatial scales from lesions to continents. Fungal Genetics and Biology. 2002;36:128–136. doi: 10.1016/S1087-1845(02)00013-0. [DOI] [PubMed] [Google Scholar]

[bib112] Zhan J, Pettway RE, McDonald BA. The global genetic structure of the wheat pathogen Mycosphaerella graminicola is characterized by high nuclear diversity, low mitochondrial diversity, regular recombination, and gene flow. Fungal Genetics and Biology. 2003;38:286–297. doi: 10.1016/s1087-1845(02)00538-8. [DOI] [PubMed] [Google Scholar]

[bib113] Zhan J, Linde CC, Jürgens T, Merz U, Steinebrunner F, McDonald BA. Variation for neutral markers is correlated with variation for quantitative traits in the plant pathogenic fungus Mycosphaerella graminicola. Molecular Ecology. 2005;14:2683–2693. doi: 10.1111/j.1365-294X.2005.02638.x. [DOI] [PubMed] [Google Scholar]

[bib114] Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–3328. doi: 10.1093/bioinformatics/bts606. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib115] Zheng X, Gogarten SM, Lawrence M, Stilp A, Conomos MP, Weir BS, Laurie C, Levine D. Seqarray-a storage-efficient high-performance data format for WGS variant calls. Bioinformatics. 2017;33:2251–2257. doi: 10.1093/bioinformatics/btx145. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A population-level invasion by transposable elements triggers genome expansion in a fungal pathogen

Ursula Oggenfuss

Thomas Badet

Thomas Wicker

Fanny E Hartmann

Nikhil Kumar Singh

Leen Abraham

Petteri Karisto

Tiziana Vonlanthen

Christopher Mundt

Bruce A McDonald

Daniel Croll

Roles

Abstract

Introduction

Results

A dynamic TE landscape shaped by strong purifying selection

Figure 1. Robust discovery and validation of transposable element (TE) insertions: (A) General analysis pipeline.

Figure 1—figure supplement 1. Validation of transposable element (TE) insertion predictions.

Figure 1—figure supplement 2. Establishment of transposable element (TE) loci with differing start and end positions in the isolates.

Figure 1—figure supplement 3. Bias for reads with a GC content lower than 30% per population.

Figure 2. Transposable element (TE) landscape across populations.

Figure 2—figure supplement 1. Validation of singleton insertions detected by mapped Illumina reads using PacBio read alignments for confirmation.

Figure 2—figure supplement 2. TE insertion loci characteristics.

Figure 2—figure supplement 3. Hierarchy superfamilies.

Detection of candidate TE loci underlying recent adaptation

Figure 3. Differentiation in transposable element insertion frequencies across the genome.

Figure 3—figure supplement 1. Global pairwise FST distributions shown separately for the 21 chromosomes.

Figure 4. Candidate adaptive transposable element (TE) insertions.

Figure 4—figure supplement 1. Additional top loci.

Population-level expansions in TE content

Figure 5. Population differentiation at transposable element (TE) and genome-wide SNP loci.

Figure 6. Global population structure of transposable element (TE) insertion polymorphism.

Figure 6—figure supplement 1. Population changes additional.

Figure 6—figure supplement 2. Heatmap loci.

TE-mediated genome size expansions

Figure 7. Core genome size and transposable element (TE) evolution across populations.

Figure 7—figure supplement 1. Genome size expansion.

Discussion

Recent selection acting on TE insertions

Population-level TE invasions and relaxed selection

TE invasion dynamics underpins genome size expansions

Materials and methods

Fungal isolate collection and sequencing

TE insertion detection

Downsampling analysis

Validation procedure for predicted TE insertions

Clustering of TE insertions into loci

Evaluation of singleton insertions

Population differentiation in TE frequencies

Genomic location of TE insertions

Core genome size estimation

Fungicide resistance assay

Acknowledgements

Funding Statement

Contributor Information

Funding Information

Additional information

Competing interests

Author contributions

Additional files

Data availability

References

Decision letter

Roles

Author response

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases