Skip to main content
PLOS Pathogens logoLink to PLOS Pathogens
. 2023 Feb 14;19(2):e1011130. doi: 10.1371/journal.ppat.1011130

Recent transposable element bursts are associated with the proximity to genes in a fungal plant pathogen

Ursula Oggenfuss 1,¤, Daniel Croll 1,*
Editor: Bart Thomma2
PMCID: PMC9970103  PMID: 36787337

Abstract

The activity of transposable elements (TEs) contributes significantly to pathogen genome evolution. TEs often destabilize genome integrity but may also confer adaptive variation in pathogenicity or resistance traits. De-repression of epigenetically silenced TEs often initiates bursts of transposition activity that may be counteracted by purifying selection and genome defenses. However, how these forces interact to determine the expansion routes of TEs within a pathogen species remains largely unknown. Here, we analyzed a set of 19 telomere-to-telomere genomes of the fungal wheat pathogen Zymoseptoria tritici. Phylogenetic reconstruction and ancestral state estimates of individual TE families revealed that TEs have undergone distinct activation and repression periods resulting in highly uneven copy numbers between genomes of the same species. Most TEs are clustered in gene poor niches, indicating strong purifying selection against insertions near coding sequences, or as a consequence of insertion site preferences. TE families with high copy numbers have low sequence divergence and strong signatures of defense mechanisms (i.e., RIP). In contrast, small non-autonomous TEs (i.e., MITEs) are less impacted by defense mechanisms and are often located in close proximity to genes. Individual TE families have experienced multiple distinct burst events that generated many nearly identical copies. We found that a Copia element burst was initiated from recent copies inserted substantially closer to genes compared to older copies. Overall, TE bursts tended to initiate from copies in GC-rich niches that escaped inactivation by genomic defenses. Our work shows how specific genomic environments features provide triggers for TE proliferation in pathogen genomes.

Author summary

Transposable elements (TEs) are engines of evolution over short and long evolutionary time scales and have played crucial roles in pathogen evolution. The impacts of TEs are multifaceted, ranging from creating adaptive sequence variants, gene disruptions, chromosomal rearrangements or even triggers of genome expansions. As a defense, pathogen genomes have evolved sophisticated mechanisms to silence or mutate TEs. Pathogens have also benefited from TEs thanks to altered virulence genes and increased antifungal resistance. How TEs cope with genomic defenses and expand in genomes (i.e., cause TE bursts) remains poorly understood though. We analyzed over a dozen high-quality genomes of a fungal wheat pathogen species, which has recently experienced TE reactivations. We reconstructed the evolutionary history of many TEs by building phylogenetic trees. Using this approach, we identified "invasion routes", i.e., tracking TE copies that constitute the most likely ancestors of renewed activity of TEs (i.e., a bursts). Our work showed that specific features, in particular the proximity to genes, were likely important drivers leading the reactivation of TEs.

Introduction

Transposable elements (TEs) are mobile DNA sequences in the genome. TEs create novel insertions by duplication or relocation of existing copies in the genome. Active TEs can proliferate in the genome at a very rapid pace (i.e., burst), leading to disruption of coding regions, increased ectopic recombination rates, chromosomal rearrangements and finally genome size expansion. In the absence of bursts, most TE families are expected to reach a plateau in copy numbers determined by a balance between insertion and deletion rates [13]. Environmental stress factors are often the trigger of TE de-repression, which can in turn lead to bursts creating new copies [4,5]. Burst of TE activity is often followed by the activation of genomic defenses [3]. Even though some TE activity can be beneficial and drive short term adaptation to new environments, the TE insertion dynamics can be highly deleterious [2,6,7]. To counter TE activity, defense mechanisms have evolved to reversibly inactivate (i.e., silence) or irreversibly mutate TEs [8,9]. Defenses against TEs include histone modifications, cytosine methylation, small RNA based silencing or KRAB zinc finger based transcriptional silencing [1013]. Some TEs have the ability to regulate their own expression through small RNAs [14]. In certain ascomycete fungi, TEs are additionally targeted by repeat-induced point mutations (RIP) during sexual recombination [1517]. RIP has the potential to introduce CpA to TpA mutations in any copy of a duplicated sequence, leading to decreased GC contents and a loss-of-function risk at targeted loci. In the ascomycete Neurospora crassa, a few generations of sexual recombination are sufficient to degenerate TE copies [18]. RIP has reduced efficacy on sequences shorter than ~500 bp in N. crassa and has almost no impact on repeats below ~200 bp allowing small TEs to escape [19]. Defense mechanisms against TEs may be weakened under stress conditions or lost over evolutionary time scales [2022]. Additionally to defenses against TEs, TE bursts are counterbalanced by deletion via ectopic recombination at the individual level, and via purifying selection and genetic drift at the population and species level [1,2,23].

The combination of TE activity, defense mechanisms and selection shape the genomic environment. Chromosomal sequences are often highly heterogeneous that contain niches of different levels of TE or gene density, or different chromatin states [24,25]. How TE insertions reshape the genomic landscape depends on the fitness effects of new insertions [26]. Generally, strong purifying selection acts against most new insertions in plant, animal and fungal species [2731]. Yet, the genomic environment can differ dramatically between species. Many yeast species with extremely low repeat content carry genomes with generally only gene-dense niches [32,33]. In contrast, maize carries a very large repeat-rich genome, with a low gene content and highly isolated genes [34]. New TEs are more likely to insert into a coding region and have deleterious effects in compact yeast genomes, compared to maize, where a novel TE most likely will insert into other TEs creating nested insertions. Such distinct properties of genomes can be framed as “resistance” or “tolerance” towards TE insertions [35]. Genome compartmentalization in plant pathogens is also referred to as the two-speed genome with the tight links between detrimental and beneficial effects of TE insertions being framed as a Devil’s Bargain [3539]. In two-speed genomes, genomic niches are well pronounced, dividing gene-rich, repeat-poor niches, and repeat-rich niches that harbor genes associated with the interaction with the plant host. Rapid diversification of such genes can be driven by the genomic environment, in particular TEs [37,40,41]. Finally, the epigenetic landscape diversifies genomic niches into transcriptionally open euchromatin (H3K4me) and heterochromatin niches tightly packed around histones. Generally, TEs are unable to insert in heterochromatic regions nor are TEs transcriptionally active [42]. Facultative (H3K27me3) marks many TE-dense regions, which can be de-repressed during stress in fungal plant pathogens (e.g., induced by stress during infection), potentially leading to TE activation [5,43]. Other regions with heterochromatin marks (constitutive heterochromatin; H3K9me3) remain repressed during stress conditions, and TEs will not be activated [5,44].

TEs are highly diverse in terms of length, coding regions or transposition mechanism. Generally, TEs are clustered into retrotransposons and DNA transposons, each with a number of orders and superfamilies [45,46]. Retrotransposons create new copies via an RNA intermediate and a copy-and-paste mechanism, and are separated into elements containing long terminal repeats (LTRs, e.g., Copia or Ty3) and non-LTR (e.g., LINEs) [45]. DNA transposons are separated into DNA transposons that use a cut-and-paste mechanism and contain terminal inverted repeats, and Helitrons with a peel-and-paste mechanism [45,47]. Some TEs, including MITEs (miniature inverted repeat transposable elements) lost their coding region and rely on the transposition mechanism of full-length TEs. At the level of sequence similarity, TEs can further be grouped into TE families. The activation and burst of a TE family will create a high number of novel TE insertions that are exact copies, that only diverge slowly with time [48]. Hence, phylogenetic analysis of all TE copies of a family can be used to reconstruct the evolutionary history of the TE family. Analogous to viral birth-death models, bursts of transpositions should leave distinct marks of short internal branches in phylogenetic trees [49,50]. Transposition bursts are characterized by a most recent common ancestor likely reflecting the copy initiating the expansion. In contrast, copies with long terminal branches have likely been silenced or mutated via RIP. TEs with long terminal branches have likely lost functionality, will initially remain in populations as remnants, accumulate mutations and ultimately degenerate. Reconstructing the succession of events leading to transposition bursts is often challenged by the difficulty in recovering all copies of a TE within a given species. The difficulty stems from the incomplete nature of many genome assemblies and the fact that TE copies are not fixed in populations. Recovering full-length copies of TEs that resulted from transposition bursts remains challenging without high-quality genome assemblies. Additionally, individual genomes of a species typically carry only a small subset of all TE copies present within the species.

Zymoseptoria tritici (previously Mycosphaerella graminicola) is an important fungal plant pathogen on wheat that shares an extended history of co-evolution with its host [51]. TEs cover 16.5–24% of the genome, often located in repeat-dense niches with genes involved in the interaction with the host [52]. Z. tritici has a moderate TE content, and recent TE activity has produced both beneficial insertions and negative impacts on the genome structure. TE activity is likely associated with incipient genome size expansions [31] and the emergence of adaptive traits. For example, different TEs have inserted in the promoter region of a major facilitator superfamily transporter gene and facilitated multidrug resistance in Z. tritici [5355]. Furthermore, increased activity of a DNA transposon reduced asexual spore production, melanization and virulence [5658]. Cytosine methylation was lost after a duplication event of the DNA methyltransferase MgDNMT gene because subsequent RIP mutations rendered all copies non-functional [59,60]. Some populations of the pathogen have likely lost dim2, an essential gene of the RIP machinery [61,62]. In contrast to older copies, a subset of young TEs show no apparent RIP signatures in recently established populations outside of the center of origin [63]. The loss of RIP seems to be a gradual process accompanying the global spread of the pathogen. Histone modifications and small RNAs likely contribute to silencing of TEs [6466]. The TE repertoire includes 304 TE families based on an analysis of 19 completely assembled genomes [52]. A subset of TEs show evidence for de-repression during plant infection likely connected to bursts of TE proliferation [5,31,52].

Here, we propose a pangenome-based approach to reconstruct the recent evolutionary history of TEs in Z. tritici. We retraced the invasion dynamics of young TEs using phylogenetics to determine triggers of TE expansions. We established near total evidence for all copies of multiple TE families by gathering information from 19 telomere-to-telomere genomes combined into a pangenome. We used phylogenetic reconstruction of ancestral TE states to identify genomic niches of TE activation and proliferation. The broad view of TE invasion routes suggests that TE copies near genes act as triggers for copy number expansions. Escape from genomic defenses is a likely the major driver of TE dynamics.

Results

Transposable element diversity within Z. Tritici

We analyzed 19 chromosome-level genomes to comprehensively map the genome-wide distribution of TE families in the fungal wheat pathogen Z. tritici (Fig 1A; [52]). Genome size and TE content vary considerably among individuals and show a positive correlation (r = 0.78, p < 0.001) [52]. We grouped TEs into MITEs, RLC/RLG, LINE and others (Fig 1B) with consensus sequences and individual TE copies accessible on Zenodo (https://zenodo.org/record/7344421). The genomes of an Australian and Iranian isolate have the highest TE content (24%) and lowest TE content (16.5%), as well as the largest genome (41.76 Mb) and smallest genome (37.13 Mb), respectively (Fig 1C and S1 Table). We focused on TE families with at least 20 copies, which collectively have a total of 23,395 copies across all analyzed genomes. Around half of the copies belong to DNA transposons (n = 10,586 copies in 104 families), half to retrotransposons (n = 11,907 copies in 59 families; S2 Table) and only few remained unclassified (n = 902 copies in seven families) without meaningful differences among genomes (S2 Table and S1 Fig). We compared frequencies of locus specific TE copies between the 19 genomes, and found most TE copies to be singletons or at low frequency (Fig 1D). Only few TE loci were fixed, (n = 122; Fig 1D) and these predominantly belong to MITEs.

Fig 1.

Fig 1

Transposable element (TE) distribution in 19 telomere-to-telomere genomes of Zymoseptoria tritici: (A) Origin of reference genome isolates originally used for the Z. tritici pangenome reported by Badet et al [52]. The color indicates the status of the dim2 gene with an important influence on RIP. Data from Möller et al [62]: dark blue = present and functional, bright blue = recently mutated, red = non-functional. Map created with ggmap version 3.0.0 [73]. Map data from OpenStreetMap: https://www.openstreetmap.org/copyright. (B) Genome size and TE copy number per isolate. Circle sizes indicate the genome size, the green shade indicates the TE content. The colors indicate MITEs (miniature inverted repeat transposable elements, small non-autonomous DNA transposons corresponding to several TE superfamilies), RLC and RLG (two superfamilies belonging to LTR) and LINE. (C) Copy numbers of TEs (left) and total length (right) in all 19 genomes. Smaller boxes correspond to TE families. (D) Allele frequency distribution of TEs at orthologous loci among genomes. TEs were defined as orthologous if they were located between the same set of orthologous genes.

Niches of transposable elements have low gene content

We assessed variation in the genomic environment in 5 kb windows both up- and downstream of each TE copy (Fig 2A). For the reference genome IPO323, we found no accumulation of TE copies in niches with marks of open chromatin (i.e., euchromatin; H3K4me9). A small subset of TE copies overlaps with constitutive heterochromatin marks (H3K9me3), or facultative heterochromatin (H3K27me3) (Fig 2A). Across all 19 genomes, we found that most TEs are located on core chromosomes (i.e., chromosomes shared among all isolates), but TE copies are at higher density on accessory chromosomes (Fig 2B). The majority of TE copies are located in niches with a GC content below 50%, with the exception of MITEs (average of 72.0% GC; Fig 2B). Only a small subset of TE copies is in niches overlapping a gene or a subtelomeric region (Fig 2B). We found no overall association between TE copies and large RIP-affected regions. However, most RLC, RLG and LINEs are located inside and most MITEs outside of large RIP affected regions (Fig 2B).

Fig 2.

Fig 2

Characteristics of TE niches across the genome: (A) Proportional overlap of H3K27me2, H3K4me9 and H3K9me3 histone methylation marks with TE niches in the reference genome IPO323. Colors indicate the group of TE. (B) TE copy numbers between core and accessory chromosomes (copy number and density): in and outside the subtelomeric region (copies and density); into large RIP affected regions (LRAR; copy and density); into niches with a moderate (≥ 50%) or low (< 50%) GC content; overlapping regions annotated as genes.

More than one third of TEs are inserted into niches with more than 80% TE content. In contrast, MITEs are preferentially inserted into TE poor niches (Fig 3A). GC content in TE copy niches varies between 25–60%. We found more than one third of TE copies 1–10 kb away from the next gene, with MITEs being on average closer (Fig 3B). TE copies are often close to RLC (Copia/Ty1), RLG (Ty3, formerly also Gypsy, see [67]) and LINE copies (902 and 2431 bp average distance, respectively). MITEs generally are at a distance 8,037 bp from the next TE on average (Fig 3C). Overall, TE density is negatively correlated with gene density and GC content (Fig 3D). TEs and TE fragments belonging to families with longer consensus sequences tend to be located in already TE rich niches. RIP and GC content are strongly negatively correlated, yet low GC content is unlikely explained by RIP alone (Fig 3D).

Fig 3.

Fig 3

Distribution of niche and TE copy characteristics: (A) Overlap of TE content, gene content and GC content with TE copy niches. (B) Distribution of the distance to the closest gene in MITEs, RLC/RLG, LINE and other TEs. The red line indicates the mean distance. (C) Distances to the next TE MITEs, RLC/RLG, LINE and other TEs. The red line indicates the mean distance. (D) Pearson correlation matrix of 11 characteristics of TE copy niches and TE copy characteristics. Dark red indicates strong positive correlation, dark blue indicates strong negative correlation of two characteristics. * p < 0.05, ** p < 0.01, *** p < 0.001.

Recent activity of high-copy transposable element families

Recently active TE families typically carry a high number of similar TE copies in the genome. We first filtered for a subset of TE families with more than 100 copies in all 19 genomes combined. The 61 retained TE families predominantly include MITEs (n = 12) as well as RLC and RLG (n = 5 and 11, respectively; S2A Fig). We find that high-copy TE families tend to also have more variable copy numbers among 19 genomes (Fig 4A). The GC content of high copy TE families generally is lower than 50%, with the exception of MITEs that have an overall higher GC content around 52%, and an extreme case with the RLC_Deimos family where most copies have a GC content below 40% (Fig 4B). Full-length TE copies range from 218–13,907 bp, with the shorter copies belonging to the non-autonomous MITEs lacking coding regions and the longest copies belonging to RLG, RLC, DHH and RIX (Fig 4C). To estimate the genetic distance of TE copies, we calculated the nucleotide diversity for each TE family. RIP activity on TE copies is expected to increase genetic distances among copies. TE families with the highest copy numbers showed lower nucleotide diversity consistent with recent proliferation in the genome, and despite having stronger signatures of RIP. MITEs tend to have higher nucleotide diversity at similar copy numbers compared to other TE families (Fig 4D and 4F). MITEs are also less affected by RIP (Fig 4E and 4F). Terminal branch lengths of individual TE copies are a further indication of increased genetic distances. Copies of MITEs tend to have overall short terminal branch lengths compared to other TEs (S2B Fig). The short length of MITEs might constrain the potential to accumulate mutations compared to longer TEs. Consistent with this, many MITEs show long internal branch lengths between distinct clades characterizing independent bursts (S3 Fig). Overall, TE families with high copy numbers and long consensus sequences show lower nucleotide diversity, however, most of the mutations are RIP-like mutations (Fig 4F).

Fig 4.

Fig 4

Characteristics of high-copy TE families: TE families are ordered from the highest copy numbers to lowest copy numbers (right) in all 19 analyzed genomes combined. (A) Distribution total copy number per TE family and isolate. (B) GC content distribution per TE family. The red line represents a GC content of 50%. (C) Length of the consensus sequence corresponding to the full-length consensus sequence excluding nested TEs or partial deletions. (D) Nucleotide diversity of the TE family (transformed as log10(nucleotide diversity*100,000)). (E) Number of RIP-like mutation (CpA<->TpA/TpG<->TpA) per TE copy, corrected for the length of the TE. (F) Correlation between copy numbers and consensus sequence lengths for TE families. Circle size corresponds to the mean number of RIP-like mutations and the color indicates the nucleotide diversity.

Transposable element expansion routes identify genomic niches of proliferation

To disentangle the factors influencing recent TE bursts, we first calculated genetic distances among copies (Fig 5A). Most TE families show their highest activity in a similar, recent age range. We found two TE families with ongoing activity (Styx, a IS3EU DNA transposon and RLX_LARD_Thrym). Among the high copy TE families, the RII_Cassini has been most recently active. RLG_Luna, RLG_Sol, RIX_Lucy and RLC_Deimos have undergone earlier bursts with both RLC_Deimos and RLG_Luna showing indications of multiple episodes of rapid proliferation. To reconstruct TE expansion routes in the genome, we built phylogenetic trees (on Zenodo https://zenodo.org/record/7344421), rooted using TE copies found in Zymoseptoria sister species. Using tree reconstruction, we assessed for each TE copy whether characteristics of the TE sequence and the genomic niche evolved from the parental node in the tree. We used ancestral state reconstruction to identify niche and sequence features of all internal nodes on each family’s tree. Compared to the ancestral state, we found increases in GC content of the TE sequence, as well as the GC and gene content of the genomic niche (Fig 5B). The distance to the closest gene decreased, and TEs are in regions with a lower TE density compared to their direct ancestors (Fig 5B). While most TE copies are located on a different chromosome compared to the parental node, new insertions typically remain on core chromosomes (65.4%) or switch from an accessory to a core chromosome (21.5%). We found that more than half of the TE copies remain in isochores of low GC content (58.4%) or jumped from moderate to low GC content (20.5%). Additionally, a large part of TE copies either remain in large RIP-affected regions or jumped into such regions (20.7%).

Fig 5.

Fig 5

Genomic origin and features of TE transposition bursts: (A) Repeat landscape of the TE families with the highest copy numbers. Colors indicate the highest copy numbers (RII_Cassini, RLC_Deimos, RLG_Sol, RLG_Luna), TEs with multiple bursts (RLC_Deimos, RIX_Lucy and RLG_Luna) or very recent burst (Styx, Thrym). DTX_MITEs_Goblin is included but shows no apparent activity. (B) Normalized range of characteristics of TE copies and their genomic niche compared between the estimated ancestral states and derived copies. A positive value indicates that metrics increased compared to the ancestor sequence, and a negative value indicates a decrease. (C) Scheme of the definition of burst and burst outgroups based on phylogenetic trees of TE families. Green indicates the copies of a burst with low terminal branch lengths and the red outgroup indicates the closest related sister branch of a burst. (D) Distribution of niche and TE copy characteristics of copies belonging to a burst clade (blue) compared to all other copies (red).

We identified individual bursts within TE families by retrieving clades of highly similar sequences distinct from other sequences in the tree (Fig 5C). We defined the outgroup of individual bursts as being closest to the parental copy that preceded the burst. Overall, around 50% of TE families experienced at least one recent burst and 10% (n = 32) of all TE families revealed several bursts. Copies resulting from individual bursts are often found only in a subset of the analyzed genomes of the species, consistent with TE bursts being recent. Yet, most burst are composed of copies originating from multiple genomes. In MITE families, a large proportion of all copies likely originate from a recent burst. To identify general properties of TE bursts, we compared copies created in a recent burst with all other copies of from that TE family (Fig 5D). Burst copies generally have a higher GC content, less RIP-like mutations, are closer to genes and are more distant to other TEs compared to non-burst copies (Fig 5D). We found also that burst copies tend to occupy genomic niches with lower TE density, overlap less likely with large RIP-affected regions and are located in niches with a higher GC content.

Niche characteristics of transposable element activation

We focused on five TE families with particularly high copy numbers and evidence for large bursts to identify drivers of expansions (i.e., LINE/I RII_Cassini, LTRs RLG_Luna RLG_Sol, and RLC_Deimos, and the MITE DTX_MITE_Goblin). Even though RLG_Luna and RLG_Sol both contain among the highest copy numbers, they show no indication of recent bursts (Fig 6). In comparing niche characteristics of older TE copies with copies generated during bursts, we detected no universal pattern shared by all TE families. There are indications that TEs part of a burst are in regions with a reduced impact of RIP showing generally higher GC content (Fig 6A), and being located in regions with a lower LRAR (Fig 6B). Nevertheless, there is no significant difference in the GC content of niches between older and burst TE copies (Fig 6C). Most TEs remain at a larger distance from genes, only copies of RLC_Deimos show a clear trend to be closer to genes for copies generated during a burst (Fig 6D). Analyzing the 38 RLC_Deimos copies that inserted into genes in the reference genomes of IPO323, three gene annotations (7.8%) encode Copia-like function, suggesting that a TE copy was wrongly annotated as a gene. Another eight genes (21.1%) lack any known function (including TE functions) and the remaining 27 genes (71.1%) have predicted protein functions unrelated to any known TE. Compared to the consensus sequence, TEs in a burst have a higher accumulation of mutations in RLC_Deimos (Fig 6E), yet the mutation rate increase seems not to stem from RIP-like mutations (Fig 6F). Other TE families share no similar pattern of temporal escape from RIP facilitating a burst.

Fig 6.

Fig 6

High copy number TE families and characteristics of burst initiation: Comparison of copies in bursts (red) and all other copies (blue) for the five TE families with the highest copy numbers. The TE families RLG_Luna and RLG_Sol have no copies assigned to burst clades (indicated by NA). The mutation rate was corrected for the length of the fragment and the number of copies across genomes.

Escape of RIP by RLC_Deimos is supported by the fact that the TE shows only single large burst following slow diversification (Fig 7A). The burst copies show clear changes both in niche characteristics as well as in sequence characteristics. The number of RIP-like mutations are gradually decreasing for copies closer to the burst, and new copies generated by the bust seem unaffected by RIP (Fig 7A). The TE niche retains a GC content close to the genome-wide GC content for both old and young copies (Fig 7B), yet TE copies in and close to the burst are almost exclusively inserted into niches devoid of RIP signatures (Fig 7C; see low LRAR). RLC_Deimos copies in general have a low GC content, yet the copies in the burst seem to have regained a higher GC content (Fig 7D). Finally, RLC_Deimos copies generated at the onset and during the burst have consistently been closer to genes compared to older copies pre-burst (Fig 7E). Interestingly, one of the putative copies leading to the burst of RLC_Deimos is found only in a single genome and inserted into a gene encoding an alpha/beta hydrolase.

Fig 7.

Fig 7

Phylogenetic reconstruction of the Deimos Copia retrotransposon: (A) Phylogenetic tree with colors indicating the number of RIP-like mutations. The black bar marks the different burst clades. The dot plot shows the changes in RIP-like mutations from the estimated ancestral state to the offspring for all internal and terminal branches from the ancestral state reconstruction. (B-E) Phylogenetic trees and ancestor-offspring changes for (B) the GC content of the niche, (C) the overlap of the niche with large RIP affected regions, (D) the GC content of the fragment and (E) the distance to the closest gene.

Compared to RLC_Deimos, RII_Cassini has likely undergone six individual bursts of which two are very recent and specific to a single analyzed reference genome (from Australia and Canada, respectively). The tree structure shows two bigger clades, with one showing a generally decreased number of RIP-like mutations (Fig 8A). All of the bursts are located in a subclade with a decreased impact of RIP. Yet, the GC content of the niche is variable (Fig 8B), and almost all copies are located in regions with a high LRAR (Fig 8C). Copies of RII_Cassini generated during bursts have a higher GC content (Fig 8D). The change in GC content and the influence of RIP seems to be gradual, as most terminal branches have similar values compared to the reconstructed ancestral state. Niche characteristics including GC content, large RIP affected regions or the distance to the closest gene are generally more weakly correlated between ancestor copies and derived copies compared to RLC_Deimos (Fig 8, dot plots).

Fig 8.

Fig 8

Phylogenetic reconstruction of the Cassini retrotransposon: (A) Phylogenetic tree with colors indicating the number of RIP-like mutations. The black bar marks the different burst clades. The dot plot shows the changes in RIP-like mutations from the estimated ancestral state to the offspring for all internal and terminal branches based on the ancestral state reconstruction. (B-E) Ancestor-offspring changes for (B) the GC content of the niche, (C) the overlap of the niche with large RIP affected regions (LRAR), (D) the GC content of the fragment and (E) the distance to the closest gene.

DTX_MITE_Goblin generally has high copy numbers among genomes with similar numbers and in orthologous positions, consistent with TE activity in the early history of the species rather than very recently. The expansion of the DTX_MITE_Goblin is characterized by a high number of bursts (S3 Fig). Copies from individual bursts are typically shared among all genomes. Despite high nucleotide diversity, few mutations were generated by RIP. The DTX_MITE_Goblin family might consist of older TE copies shared between genomes, that are not affected by RIP. Taken together, high copy number TEs tend to have very recent burst origins, a higher GC content compared to the other copies and an ability to evade genomic defenses.

Discussion

TE activity is an important disruptor of genomic integrity and the potential for deleterious effects is strongly influenced by selection and insertion site preferences. TE dynamics also play key roles in pathogenicity evolution by modulating effector functions and expression in fungal plant pathogens. Our joint analyses of 19 complete Z. tritici genomes demonstrate that multiple TE families have been highly active in the recent evolutionary history of this species. A substantial number of TEs have produced one or multiple bursts of proliferation that have distinct niche characteristics closer to genes and higher GC content compared with copies predating bursts. Recent bursts have mostly produced TE copies of the non-autonomous MITEs and the Copia family RLC_Deimos. MITEs likely escape the detection by genomic defenses including RIP due to their short length. However, the long copies of the RLC_Deimos were strongly affected by RIP prior to the bursts. The recent proliferation of RLC_Deimos copies without apparent accumulation of RIP-like mutations suggests reduced efficacy of RIP in general. Reduced RIP activity in some genomes is consistent with mutations and losses in the dim2 gene implicated in RIP. Overall, genomic defense mechanisms were only partially effective against preventing the proliferation of TEs. Beyond the escape from RIP, our analyses indicate that a key trigger point for initiating new bursts is likely the successful insertion close to coding sequences. Hence, the genomic environment appears to play a key role in the evolutionary success of TE families.

We identified the emergence and expansion of numerous clades within TE families consistent with a burst of rapid copy number expansions. TE families often include large numbers of inactive copies that accumulate mutations and are unlikely to cause new insertions. Consistent with the dynamic nature of TE families, many TEs of Z. tritici are not detectable in the genomes of sister species. In particular, MITE families are most often absent in the sister species. This is consistent with rapid divergence of these non-autonomous elements devoid of coding sequences. Searching for more distant homologs to Z. tritici TEs might help identify additional putative ancestors, as observed in other species [45,68]. Rapid proliferation of new TE copies is highly uneven among TE families, with a small subset of families driving the majority of recent insertions. Such new insertions have persisted despite potential purifying selection against new insertions, silencing or mutations introduced by the genomics defenses (i.e., RIP) shared by many ascomycete plant pathogens. TE families such as the RLC_Deimos were after an initial activity successfully repressed, but more recently reactivated starting a new branch or “subfamily” that originated from a particular subset of TE copies in the genome [49].

Active TEs that generate identical copies typically trigger defense mechanisms. Hence, such TE families are the most likely to be highly affected by RIP mutations. However, our results suggest RIP has only a weak impact on young TEs. Weak RIP signatures were predominantly found in MITEs, which supports the idea that smaller TEs most successfully evade recognition, as seen in N. crassa or Parastagonospora nodorum [19,69]. Longer elements show stronger impacts of RIP using both GC content and RIP-like mutations as proxies. We found evidence for RIP mostly in TE copies closer to the root, while copies generated during recent bursts show nearly no RIP signatures. Escaping the effects of RIP may be a prerequisite for the initiation of a burst, hence the strong association of genetic distance and RIP mutations. RIP has mostly been studied in the Ascomycete N. crassa, where RIP introduces a large number of mutations after just one generation of sexual recombination [18]. The life cycle of Z. tritici is thought to consist of several cycles of asexual reproduction during the growing season, and only one round of sexual reproduction at the end of the season [70]. Hence, TEs may proliferate despite RIP for short periods but ubiquitous sexual reproduction should provide ample opportunity for RIP to act on copies of any recent burst. Evidence for RIP mutations are widespread in the genome of Z. tritici and the machinery for RIP is present in at least some isolates in the Middle East, the center of origin of the species [17]. Species-wide analyses have suggested that newly established populations of the pathogen have lost a functional RIP machinery [22,62,63]. Two isolate-specific bursts in the Canadian and Australian strain genomes for RII_Cassini might indicate a spread of TEs after the loss of RIP. The geographic and temporal variation of RIP-mediated TE control in Z. tritici adds significant complexity to predict triggers of TE bursts. As the pathogen shows high degrees of gene flow even among continents, genotypes recovered from regions without active RIP may still present chromosomal segments with recent RIP activity due to admixture events introducing such RIP-affected regions. Hence, individual genomes are unlikely to carry homogeneous signatures of either RIP or consequences of TE activation due to a loss of RIP.

The only burst in RLC_Deimos is separating presumably older copies with very low GC content from burst copies driving the creation of a TE subfamily. The exact trigger of the RLC_Deimos expansion is unknown, however the ladder-like structure in the phylogeny predating the burst suggests a slow but regular pace of creating additional copies until the TE gained the ability to expand through a burst to create a large number of new copies. Loss of the RIP machinery would help preserve nearly identical copies of recently duplicated TE sequences and would maintain GC content at high levels. To what extent the loss of RIP has shaped the distribution of GC content across the genome including TEs copies remains unknown. The absence of RIP-like mutations in nearly all recently expanded TE families strongly suggests that recent TE proliferation was enabled by weakened genome defenses in the species. Yet, the absence of RIP cannot explain the strong increase in GC content in TE copies of recent burst, especially RLC_Deimos. The GC content is expected to remain low unless mechanisms such as GC-biased gene conversion increase GC content. Gene conversion is thought to have only a weak impact on the Z. tritici genome [71]

A mechanistic understanding of triggers activating TEs in fungal pathogens is largely lacking, yet here we show the importance of the niche harboring TE copies. Young TE copies have distinct associations with particular genomic niches compared with older copies, which tend to be located in niches with high TE content. In contrast, copies triggering recent bursts and the resulting burst copies themselves tend to be inserted closer to genes. However, the association of young TE copies with genomic niches is most likely confounded by the action of purifying selection. As TEs can disrupt coding sequences or change expression profiles of neighboring genes, purifying selection is most likely strongest against TEs inserting into gene rich niches. In contrast, TEs inserting into TE-rich niches possibly causing nested TE copies likely to have only a minor impact on fitness. In some fungal plant pathogens including Z. tritici, such nested insertions led to the compartmentalization of niches with high TE density and niches mostly composed of genes [72]. Repeated insertions of TEs into such TE-rich niches likely exacerbated genome compartmentalization. Selection is also expected to act on the effectiveness of defenses against TEs. Silencing or hypermutation of TEs close to genes may disrupt the functionality of the genes as well. Hence, the efficiency of genomic defenses against TEs may be weakened by selection. It is thus conceivable that otherwise silenced TEs can remain both functional and active when inserted close to a gene. As the TE copies created during the RLC_Deimos burst are significantly closer to genes, this suggests that the proximity to genes provides a “secure niche”, in which TEs will not be as efficiently targeted by defense mechanisms. RLC_Deimos copies predating the burst are generally in niches with a strong impact of RIP (i.e., LRAR), yet all elements generated during the burst are in regions outside of LRAR. While LRAR are mostly made of TEs and other repeats, such regions are likely marked by heterochromatin and repressed. The insertion of burst TE copies into euchromatic regions close to genes may have facilitated to sustain TE activity. However, the benefits of inserting closer to genes to trigger bursts are not universal among TEs or insertion site preferences prevent switches from low GC to high GC content regions (i.e., close to genes). This may be the case for RII_Cassini, which showed no clear association with distance to genes and burst triggers. Similarly, copies of DTX_MITEs_Goblin may have a universal preference of coding regions.

Beyond the benefit of weaker genomic defenses, the propensity of bursts being initiated by TEs inserting near genes may also be related to beneficial impacts of the TE copy itself. We found that copies at the start of bursts tend to have higher allele frequency within the species (i.e., present in most analyzed genomes; see also S4 Fig). Larger pathogen population genomic datasets will enable the analysis of selection acting on parental copies initiating bursts. It is conceivable though that the beneficial effects of an individual TE copy are linked to triggers of TE bursts. However, selection at the organismal level driven by fitness benefits conferred by specific TE copies (e.g., beneficial modifications of effector gene functions) is compounded by higher proliferation rates which benefits TEs themselves. Such multi-level selection was recently suggested to represent a Devil’s Bargain in plant pathogens trading short term benefits of TE copies with longer term risks of genome expansions [35]. Assessing fitness effects of individual TE copies along with their expansion history will enable further hypothesis testing about the proximate drivers of TE expansions over a range of evolutionary time scales. Resolving drivers of TE dynamics will help predict the risk individual pathogens pose and how adaptive evolution is likely to proceed. This will require an integration of genome-wide TE dynamics with the consequences for host-pathogen interactions, and ultimately improve our mechanistic understanding of rapid evolutionary processes in plant pathogens.

Methods

Genome sequences and transposable element detection

We used a set of 19 reference-quality genomes of Z. tritici assembled using PacBio sequencing [52; European Nucleotide Archive BioProject PRJEB33986]. The genomes cover the global genetic diversity of the species with isolates originating from 14 countries and six continents (Fig 1A and S1 Table). We created the map using ggmap version 3.0.0 [73]. Map data originated from OpenStreetMap: https://www.openstreetmap.org/copyright. We used an improved TE annotation for the species with elements retrieved from all assembled genomes [52]. TE annotation steps included using RepeatMasker, LTR-Finder, MITE-Tracker, SINE-Finder, Sine-Scan and extensive manual curation with WICKERsoft and named based on the three-letter code [45,52,7481]. The primary TE annotation was followed by stringent filtering steps to detect nested insertions and to join TE fragments. Simple repeats, low complexity regions and elements smaller than 100 bp were removed. TEs belonging to the same family overlapping by more than 100 bp were merged. TEs belonging to different families overlapping by more than 100 bp were considered as nested insertions. TEs belonging to the same family separated by less than 200 bp were considered as fragmented TEs and merged into a single element [52]. We additionally annotated TEs using the same pipeline in high quality genomes of the sister species Z. ardabiliae, Z. brevis, Z. pseudotritici and Z. passerinii [82].

Multiple sequence alignments

We created multiple sequence alignments for all copies belonging to the same TE family from the 19 Z. tritici and four sister species genomes. We extracted all sequences of TE families with copy numbers ≥ 20 with the function faidx in samtools version 1.9 [83]. In case of fragmented elements, we extracted all fragments as individual copies. We reverse-complemented sequences where necessary prior to sequence alignment. To extract coding regions, we performed blastx searches against the PTREP18 database and against the non-redundant protein database from NCBI (09/2020) with diamond blast version 0.9.32.133 and selected the hit with the highest bit score with at least 200 bp length (Thomas Wicker; http://botserv2.uzh.ch/kelldata/trep-db/index.html) [84,85]. For small non-autonomous TE families lacking a coding region, we retained the entire sequence. We created multiple sequence alignments for each family with MAFFT version 7.453 and the following parameters: --thread 1 --reorder --localpair --maxiterate 1000 --nomemsave --leavegappyregion [86]. For four TE families with high copy numbers and large coding regions (RII_Cassini, RLG_Luna, RLG_Sol, RLC_Deimos), we slightly decreased accuracy of MAFFT, using the parameters --6merpair instead of --localpair.

TE family divergence

TE families are expected to be active during different time spans and evolve at different rates. To estimate the genetic distance of the TE families, we ran RIPCAL with --windowsize 1000 --model consensus to create an additional consensus sequence that includes all copies of a TE family [87,88]. In R we created DNAbin objects with the R package ape version 5.3 and calculated nucleotide diversity of the multiple sequence alignments for each TE family with nuc.div in the package pegas version 0.13 [8991]. To compare between TE families, we divided the nucleotide diversity by the length of the corresponding TE coding region. We estimated the genetic distance of TE bursts per family using RepeatMasker. To compare recent activity or bursts, we created a repeat landscape using build Summary, calcDivergenceFromAlign using Kimura divergence and createRepeatLandscape in RepeatMasker and visualized the results with ggplot [92].

Genomic environment of TE copies

We described the genomic characteristics of niches containing TE copies. The distance between genes in the Z. tritici reference genome IPO323 is estimated to be around 2kb, the genome-wide GC is 51.7%, and the TE content is 19.1% [52]. Data on TE, genes, large RIP affected regions and histone mark distributions are shown available in S5 Fig. To scan the genomic environment including contain genes and TEs, we created 5kb windows up- and downstream of TE copies. Then, we calculated the TE and gene content based on TE and gene annotations, respectively, using the intersect command in bedtools version 2.28.0. We calculated GC content with the geecee tool in EMBOSS version 6.6.0 [9395]. We also calculated the distance to the closest gene and TE with the closest command in bedtools. We used Occultercut version 1.1 with default parameters to detect isochores with low (≤ 49%) or moderate (> 49%) GC content [96]. We used The RIPper to identify large RIP affected regions in all analyzed genomes, and calculated the overlap of TE copies and RIP affected regions with bedtools intersect [97]. For the reference genome IPO323, we used available ChIP-seq information (http://ascobase.cgrb.oregonstate.edu/cgi-bin/gb2/gbrowse/ztitici_public/) to define the chromatin structure in niches around TEs [64]. To reduce effects of TE characteristics in downstream analyses, we grouped TEs into the major categories: MITEs, retrotransposons (RLC and RLG), LINE and others.

Characteristics of TE copies

Many TE sequences in the genome are fragmented due to nested insertions or partial deletions. To improve the quality of multiple sequence alignments, we selected only TE coding regions for phylogenetic analyses (S6 Fig). We extracted the coding regions from multiple sequence alignments with extractalign from EMBOSS, based on the position of the coding sequences. We removed all sequences not covered by the coding region with trimAl -gt 0 version 1.4.rev15 (http://trimal.cgenomics.org) from the multiple sequence alignment, and removed fragments that contained more than 50% gap positions in the coding region [98]. For downstream phylogenetic analyses, we exclusively used the filtered multiple sequence alignment of the coding regions. We calculated the GC content of each TE coding region with geecee in EMBOSS. To quantify RIP-like mutation signatures, we extracted dinucleotide frequencies for each TE family alignment with count in the package seqinr [99]. To define locus specific TE dynamics, we identified first the closest up- and downstream fixed orthologous genes based on the annotation of the pangenome with closest in bedtools [52]. Next, we defined TE copies belonging to the same TE family and being located between the same fixed orthologous genes as orthologous TE groups. As most TE copies are singletons or only present in few isolates, we did not further investigate orthologous TE copies. Visualizations were made with ggplot [100].

Maximum likelihood trees

We estimated maximum likelihood trees for all TE families with indications for recent activity and bursts in the species. We extracted conserved blocks of the coding region with Gblocks version 0.91b, using the following parameters: -t = d -b3 = 10 -b4 = 5 -b5 = a -b0 = 5 [101]. For each TE family, we included two sequences retrieved from the same TE in sister species genomes to root trees. We estimated maximum likelihood trees with RAxML version 8.2 [102]. For this, we generated 20 ML trees with each a different starting tree and extracted the starting tree with best likelihood with the following parameters: raxmlHPC-PTHREADS-SSE3 -T 4 -m GTRGAMMA -p 12345 -# 10 --print-identical-sequences. We performed a bootstrap analysis to obtain branch support values with the following parameters: raxmlHPC-PTHREADS-SSE3 -T 4 -m GTRGAMMA -p 12345 -b 12345 -# 50 --print-identical-sequences. Finally, we added bipartitions on the best ML tree with the following parameters: raxmlHPC-PTHREADS-SSE3 -T 4 -m GTRGAMMA -p 12345 -f b --print-identical-sequences.

Ancestral state reconstruction

We performed ancestral state reconstruction, using the characteristics of each TE family both on characteristics of the TE sequences or the niche of the TE copy as phenotype. We imported the best scoring ML trees created in RAxML into R using read.tree from the package treeio version 1.10.0 [103]. We rooted trees with root in the R package ape, using sister species sequences as outgroups. We converted tree objects to tibble objects with as_tibble from the package tibble version 3.0.1 in tidyverse version 1.3.0 and assigned each sequence in the tree metadata on TE characteristics using dplyr version 0.8.5 in tidyverse [104106]. We then converted the tibble objects back to tree formats with as.treedata in treeio [105]. Using fastAnc and contMap from package phytools version 0.7–47, we performed ancestral state reconstruction for characteristics of the following continuous traits: gene density, TE density and GC content of 2.5 kb niches surrounding the TEs, closest gene, GC and RIP-like mutations per bp [107]. To estimate ancestral states for binary characteristics (GC rich vs. poor, core vs. accessory chromosome), we used make.simmap from the package phytools with an equal rates model and 100 simulations. We visualized the trees with ggtree version 2.0.1 [108]. To retrieve clades representing recent bursts, we created polytomy trees from binary trees, using the command CollapseNode from TreeTools version 1.4.4 in R at branch lengths smaller than 1.1e-05 [109]. For each burst clade, we defined the parental branch and an outgroup of the clade with offspring in the treeio package as the ancestral branch. The outgroup represents a copy outside but close to the burst clade. We compared niche metrics of ancestral branches of bursts with the distribution of the same metric in all elements outside of bursts. We performed associating mapping for shared characteristics along the phylogenetic tree using treeWAS [110].

Supporting information

S1 Fig. Hierarchy TE superfamilies: Classes, subclasses, orders, superfamilies as well as the tree-letter code according to Wicker et al (2007) [45].

The Z. tritici specific family names are according to Badet et al (2020) [52].

(PDF)

S2 Fig. Characteristics of high-copy TE families: TE families are ordered from the highest copy numbers to lowest copy numbers (right) in all 19 analyzed genomes combined.

(A) Total copy numbers. (B) Long (> 0.00001; red) and short (≤0.00001; blue) terminal branch lengths of individual copies characterizing two classes of divergence times.

(PDF)

S3 Fig. Phylogenetic tree of the TE family DTX_MITE_Gobblin: (A) Phylogenetic tree with colors indicating the number of RIP-like mutations.

The black bar marks the different burst clades. The dot plot shows the changes in RIP-like mutations from the ancestor to offspring for all internal and terminal branches from the ancestral state reconstruction. (B-E) Phylogenetic trees and ancestor-offspring changes for (B) the GC content of the niche, (C) the overlap of the niche with large RIP affected regions, (D) the GC content of the copy and (E) the distance to the closest gene.

(PDF)

S4 Fig. TE copy frequency.

Comparison of TE copy frequency between outgroups of burst and all other TE copies.

(PDF)

S5 Fig. Genomic environment of the Z. tritici reference genome IPO323.

Circos plot showing the genomic environment of the reference genome IPO323 (Dutch isolate). Description from outside to inside contains the GC content, gene content and TE content in windows of 10kb, the presence of large RIP affected regions (LRAR), and the indication of the histone marks H3K4, H3K27 and H3K9. Chromosomes 1–13 are core chromosomes that are present in each isolate, while chromosomes 14–21 are accessory chromosomes, that are not shared among all isolates.

(PDF)

S6 Fig. Procedure to obtain multiple sequence alignments among copies of TE families.

Due to the high number of nested insertions and partially deleted fragments, we aligned only coding regions.

(PDF)

S1 Table. Features of the global pangenome of Zymoseptoria tritici used for transposable element analyses.

(XLSX)

S2 Table. Description of all TE copies analyzed: assigned TE family, information about isolate of origin, position in the genome, niche and TE sequence characteristics.

(XLSX)

Acknowledgments

We are very grateful for helpful comments on an earlier version of the manuscript by Emile Gluck-Thaler, for discussions about phylogenetic inference with Vinciane Mossion and statistical advice by Claudia Sarai Reyes-Avila.

Data Availability

The genome assembly and annotation for genome assemblies are available at the European Nucleotide Archive (http://www.ebi.ac.uk/ena) under the BioProject PRJEB33986. TE consensus sequences, raw sequences and phylogenetic trees are available on Zenodo (https://zenodo.org/record/7344421).

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Charlesworth B, Charlesworth D. The population dynamics of transposable elements. Genet Res. 1983;42: 1–27. doi: 10.1017/S0016672300021455 [DOI] [Google Scholar]
  • 2.Petrov DA, Aminetzach YT, Davis JC, Bensasson D, Hirsh AE. Size Matters: Non-LTR Retrotransposable Elements and Ectopic Recombination in Drosophila. Mol Biol Evol. 2003;20: 880–892. doi: 10.1093/molbev/msg102 [DOI] [PubMed] [Google Scholar]
  • 3.Le Rouzic A, Capy P. The first steps of transposable elements invasion: Parasitic strategy vs. genetic drift. Genetics. 2005;169: 1033–1043. doi: 10.1534/genetics.104.031211 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Casacuberta E, González J. The impact of transposable elements in environmental adaptation. Mol Ecol. 2013;22: 1503–1517. doi: 10.1111/mec.12170 [DOI] [PubMed] [Google Scholar]
  • 5.Fouché S, Badet T, Oggenfuss U, Plissonneau C, Francisco CS, Croll D. Stress-Driven Transposable Element De-repression Dynamics and Virulence Evolution in a Fungal Pathogen. Mol Biol Evol. 2020;37: 221–239. doi: 10.1093/molbev/msz216 [DOI] [PubMed] [Google Scholar]
  • 6.Lisch D. How important are transposons for plant evolution? Nature Review Genetics. 2013;14: 49–61. doi: 10.1038/nrg3374 [DOI] [PubMed] [Google Scholar]
  • 7.Chuong EB, Elde NC, Feschotte C. Regulatory activities of transposable elements: from conflicts to benefits. Nat Rev Genet. 2017;18: 71–86. doi: 10.1038/nrg.2016.139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lisch D, Bennetzen JL. Transposable element origins of epigenetic gene regulation. Curr Opin Plant Biol. 2011;14: 156–161. doi: 10.1016/j.pbi.2011.01.003 [DOI] [PubMed] [Google Scholar]
  • 9.Daboussi MJ, Capy P. Transposable Elements in Filamentous Fungi. Annu Rev Microbiol. 2003;57: 275–299. doi: 10.1146/annurev.micro.57.030502.091029 [DOI] [PubMed] [Google Scholar]
  • 10.Jacobs FMJ, Greenberg D, Nguyen N, Haeussler M, Ewing AD, Katzman S, et al. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature. 2014;516: 242–245. doi: 10.1038/nature13760 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lisch D. Epigenetic regulation of transposable elements in plants. Annu Rev Plant Biol. 2009;60: 43–66. doi: 10.1146/annurev.arplant.59.032607.092744 [DOI] [PubMed] [Google Scholar]
  • 12.Yang P, Wang Y, Macfarlan TS. The Role of KRAB-ZFPs in Transposable Element Repression and Mammalian Evolution. Trends in Genetics. 2017;33: 871–881. doi: 10.1016/j.tig.2017.08.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Schmitz RJ, Lewis ZA, Goll MG. DNA Methylation: Shared and Divergent Features across Eukaryotes. Trends in Genetics. 2019;35: 818–827. doi: 10.1016/j.tig.2019.07.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rebollo R, Romanish MT, Mager DL. Transposable Elements: An Abundant and Natural Source of Regulatory Sequences for Host Genes. In: Bassler BL, editor. Annual Review of Genetics, Vol 46. 2012. pp. 21–42. doi: 10.1146/annurev-genet-110711-155621 [DOI] [PubMed] [Google Scholar]
  • 15.Galagan JE, Selker EU. RIP: the evolutionary cost of genome defense. Trends in Genetics. 2004;20: 417–423. doi: 10.1016/j.tig.2004.07.007 [DOI] [PubMed] [Google Scholar]
  • 16.Gladyshev E, Kleckner N. Recombination-independent recognition of DNA homology for repeat-induced point mutation. Curr Genet. 2017;63: 389–400. doi: 10.1007/s00294-016-0649-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.van Wyk S, Wingfield BD, De Vos L, van der Merwe NA, Steenkamp ET. Genome-Wide Analyses of Repeat-Induced Point Mutations in the Ascomycota. Front Microbiol. 2021;11. doi: 10.3389/fmicb.2020.622368 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang L, Sun Y, Sun X, Yu L, Xue L, He Z, et al. Repeat-induced point mutation in Neurospora crassa causes the highest known mutation rate and mutational burden of any cellular life. Genome Biol. 2020;21: 1–23. doi: 10.1186/s13059-020-02060-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gladyshev E, Kleckner N. Direct recognition of homology between double helices of DNA in Neurospora crassa. Nat Commun. 2014;5. doi: 10.1038/ncomms4509 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.González J, Lenkov K, Lipatov M, Macpherson JM, Petrov DA. High Rate of Recent Transposable Element-Induced Adaptation in Drosophila melanogaster. PLoS Biol. 2008;6: 2109–2129. doi: 10.1371/journal.pbio.0060251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Horváth V, Merenciano M, González J. Revisiting the Relationship between Transposable Elements and the Eukaryotic Stress Response. Trends in Genetics. 2017;33: 832–841. doi: 10.1016/j.tig.2017.08.007 [DOI] [PubMed] [Google Scholar]
  • 22.Lorrain C, Feurtey A, Möller M, Haueisen J, Stukenbrock EH. Dynamics of transposable elements in recently diverged fungal pathogens: Lineage-specific transposable element content and efficiency of genome defenses. G3: Genes, Genomes, Genetics. 2021;11: 0–12. doi: 10.1093/g3journal/jkab068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Devos KM, Brown JKM, Bennetzen JL. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 2002;12: 1075–1079. doi: 10.1101/gr.132102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Brookfield JFY. The ecology of the genome—Mobile DNA elements and their hosts. Nat Rev Genet. 2005;6: 128–136. doi: 10.1038/nrg1524 [DOI] [PubMed] [Google Scholar]
  • 25.Stitzer MC, Anderson SN, Springer NM, Ross-Ibarra J. The genomic ecosystem of transposable elements in maize. PLoS Genet. 2021;17: e1009768. doi: 10.1371/journal.pgen.1009768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sigman MJ, Slotkin RK. The First Rule of Plant Transposable Element Silencing: Location, Location, Location. Plant Cell. 2016;28: 304–313. doi: 10.1105/tpc.15.00869 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Cridland JM, Macdonald SJ, Long AD, Thornton KR. Abundance and distribution of transposable elements in two drosophila QTL mapping resources. Mol Biol Evol. 2013;30: 2311–2327. doi: 10.1093/molbev/mst129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Stuart T, Eichten SR, Cahn J, Karpievitch Y V, Borevitz JO, Lister R. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. Elife. 2016;5: 1–27. doi: 10.7554/eLife.20777 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lai X, Schnable JC, Liao Z, Xu J, Zhang G, Li C, et al. Genome-wide characterization of non-reference transposable element insertion polymorphisms reveals genetic diversity in tropical and temperate maize. BMC Genomics. 2017;18: 1–13. doi: 10.1186/s12864-017-4103-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Stritt C, Gordon SP, Wicker T, Vogel JP, Roulin AC. Recent activity in expanding populations and purifying selection have shaped transposable element landscapes across natural accessions of the Mediterranean grass Brachypodium distachyon. Genome Biol Evol. 2017;10: 1–38. doi: 10.1093/gbe/evx276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Oggenfuss U, Badet T, Wicker T, Hartmann FE, Singh NK, Abraham LN, et al. A population-level invasion by transposable elements triggers genome expansion in a fungal pathogen. Elife. 2021;10: 1–25. doi: 10.7554/eLife.69249 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bleykasten-Grosshans C, Neuvéglise C. Transposable elements in yeasts. C R Biol. 2011;334: 679–686. doi: 10.1016/j.crvi.2011.05.017 [DOI] [PubMed] [Google Scholar]
  • 33.Lertwattanasakul N, Kosaka T, Hosoyama A, Suzuki Y, Rodrussamee N, Matsutani M, et al. Genetic basis of the highly efficient yeast Kluyveromyces marxianus: Complete genome sequence and transcriptome analyses. Biotechnol Biofuels. 2015;8: 1–14. doi: 10.1186/s13068-015-0227-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, et al. Structure and architecture of the maize genome. Plant Physiol. 2005;139: 1612–1624. doi: 10.1104/pp.105.068718 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Fouché S, Oggenfuss U, Chanclud E, Croll D. A devil’s bargain with transposable elements in plant pathogens. Trends in Genetics. 2022;38: 222–230. doi: 10.1016/j.tig.2021.08.005 [DOI] [PubMed] [Google Scholar]
  • 36.Raffaele S, Kamoun S. Genome evolution in filamentous plant pathogens: why bigger can be better. Nat Rev Microbiol. 2012;10: 417–430. doi: 10.1038/nrmicro2790 [DOI] [PubMed] [Google Scholar]
  • 37.Dong S, Raffaele S, Kamoun S. The two-speed genomes of filamentous pathogens: waltz with plants. Curr Opin Genet Dev. 2015;35: 57–65. doi: 10.1016/j.gde.2015.09.001 [DOI] [PubMed] [Google Scholar]
  • 38.Frantzeskakis L, Kusch S, Panstruga R. The need for speed: compartmentalized genome evolution in filamentous phytopathogens. Mol Plant Pathol. 2019;20: 3–7. doi: 10.1111/mpp.12738 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Torres DE, Oggenfuss U, Croll D, Seidl MF. Genome evolution in fungal plant pathogens: looking beyond the two-speed genome model. Fungal Biol Rev. 2020;34: 136–143. doi: 10.1016/j.fbr.2020.07.001 [DOI] [Google Scholar]
  • 40.Rouxel T, Grandaubert J, Hane JK, Hoede C, van de Wouw AP, Couloux A, et al. Effector diversification within compartments of the Leptosphaeria maculans genome affected by Repeat-Induced Point mutations. Nat Commun. 2011;2: 202. doi: 10.1038/ncomms1189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Van Dam P, Fokkens L, Ayukawa Y, Van Der Gragt M, Ter Horst A, Brankovics B, et al. A mobile pathogenicity chromosome in Fusarium oxysporum for infection of multiple cucurbit species. Sci Rep. 2017;7: 1–15. doi: 10.1038/s41598-017-07995-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Seidl MF, Thomma BP. Transposable Elements Direct The Coevolution between Plants and Microbes. Trends in Genetics. 2017;xx: 1–10. doi: 10.1016/j.tig.2017.07.003 [DOI] [PubMed] [Google Scholar]
  • 43.Chujo T, Scott B. Histone H3K9 and H3K27 methylation regulates fungal alkaloid biosynthesis in a fungal endophyte-plant symbiosis. Mol Microbiol. 2014;92: 413–434. doi: 10.1111/mmi.12567 [DOI] [PubMed] [Google Scholar]
  • 44.Freitag M. Histone Methylation by SET Domain Proteins in Fungi. Annu Rev Microbiol. 2017;71: 413–439. doi: 10.1146/annurev-micro-102215-095757 [DOI] [PubMed] [Google Scholar]
  • 45.Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8: 973–982. doi: 10.1038/nrg2165 [DOI] [PubMed] [Google Scholar]
  • 46.Wells JN, Feschotte C. A Field Guide to Transposable Elements. Annu Rev Genet. 2020;54: 7–34. doi: 10.1146/annurev-genet-040620-022145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kojima KK. AcademH, a lineage of Academ DNA transposons encoding helicase found in animals and fungi. Mob DNA. 2020;11: 1–11. doi: 10.1186/s13100-020-00211-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Lerat E, Rizzon C, Biémont C. Sequence divergence within transposable element families in the Drosophila melanogaster genome. Genome Res. 2003;13: 1889–1896. doi: 10.1101/gr.827603 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Blumenstiel JP. Birth, School, Work, Death, and Resurrection: The Life Stages and Dynamics of Transposable Element Proliferation. Genes (Basel). 2019;10. doi: 10.3390/genes10050336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Volz EM, Koelle K, Bedford T. Viral Phylodynamics. PLoS Comput Biol. 2013;9. doi: 10.1371/journal.pcbi.1002947 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Stukenbrock EH, Banke S, Javan-Nikkhah M, McDonald BA. Origin and domestication of the fungal wheat pathogen Mycosphaerella graminicola via sympatric speciation. Mol Biol Evol. 2007;24: 398–411. doi: 10.1093/molbev/msl169 [DOI] [PubMed] [Google Scholar]
  • 52.Badet T, Oggenfuss U, Abraham LN, McDonald BA, Croll D. A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici. BMC Biol. 2020;18: 12. doi: 10.1186/s12915-020-0744-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Omrane S, Sghyer H, Audeon C, Lanen C, Duplaix C, Walker AS, et al. Fungicide efflux and the MgMFS1 transporter contribute to the multidrug resistance phenotype in Zymoseptoria tritici field isolates. Environ Microbiol. 2015;17: 2805–2823. doi: 10.1111/1462-2920.12781 [DOI] [PubMed] [Google Scholar]
  • 54.Omrane S, Audéon C, Ignace A, Duplaix C, Aouini L, Kema G, et al. Plasticity of the MFS1 Promoter Leads to Multidrug Resistance in the Wheat Pathogen Zymoseptoria tritici. Mitchell AP, editor. mSphere. 2017;2: 1–42. doi: 10.1128/mSphere.00393-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Mäe A, Fillinger S, Sooväli P, Heick TM. Fungicide Sensitivity Shifting of Zymoseptoria tritici in the Finnish-Baltic Region and a Novel Insertion in the MFS1 Promoter. Front Plant Sci. 2020;11: 1–10. doi: 10.3389/fpls.2020.00385 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Krishnan P, Meile L, Plissonneau C, Ma X, Hartmann FE, Croll D, et al. Transposable element insertions shape gene regulation and melanin production in a fungal pathogen of wheat. BMC Biol. 2018;16: 1–18. doi: 10.1186/s12915-018-0543-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Meile L, Croll D, Brunner PC, Plissonneau C, Hartmann FE, McDonald BA, et al. A fungal avirulence factor encoded in a highly plastic genomic region triggers partial resistance to septoria tritici blotch. New Phytologist. 2018;219: 1048–1061. doi: 10.1111/nph.15180 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Wang C, Milgate A, Solomon P, McDonald MC. The identification of a mobile repetitive element affecting the asexual reproduction of the wheat pathogen Zymoseptoria tritici. Mol Plant Pathol. 2021; 1–17. doi: 10.1111/mpp.13064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Dhillon B, Cavaletto JR, Wood K V, Goodwin SB. Accidental Amplification and Inactivation of a Methyltransferase Gene Eliminates Cytosine Methylation in Mycosphaerella graminicola. Genetics. 2010;186: 67–U139. doi: 10.1534/genetics.110.117408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Dhillon B, Gill N, Hamelin RC, Goodwin SB. The landscape of transposable elements in the finished genome of the fungal wheat pathogen Mycosphaerella graminicola. BMC Genomics. 2014;15. doi: 10.1186/1471-2164-15-1132 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Lorrain C, Oggenfuss U, Croll D, Duplessis S, Stukenbrock EH. Transposable Elements in Fungi: Coevolution With the Host Genome Shapes, Genome Architecture, Plasticity and Adaptation. Encyclopedia of Mycology. Elsevier; 2021. pp. 142–155. doi: [DOI] [Google Scholar]
  • 62.Möller M, Habig M, Lorrain C, Feurtey A, Haueisen J, Fagundes WC, et al. Recent loss of the Dim2 DNA methyltransferase decreases mutation rate in repeats and changes evolutionary trajectory in a fungal pathogen. PLoS Genet. 2021;17: 1–27. doi: 10.1371/journal.pgen.1009448 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Feurtey A, Lorrain C, McDonald MC, Milgate AW, Solomon PS, Warren R, et al. A thousand-genome panel retraces the global spread and climatic adaptation of a major crop pathogen. bioRxiv. 2022; 1–24. 10.1101/2022.08.26.505378 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Schotanus K, Soyer JL, Connolly LR, Grandaubert J, Happel P, Smith KM, et al. Histone modifications rather than the novel regional centromeres of Zymoseptoria tritici distinguish core and accessory chromosomes. Epigenetics Chromatin. 2015;8. doi: 10.1186/s13072-015-0033-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Kettles GJ, Hofinger BJ, Hu P, Bayon C, Rudd JJ, Balmer D, et al. SRNA profiling combined with gene function analysis reveals a lack of evidence for cross-kingdom RNAi in the wheat–Zymoseptoria tritici pathosystem. Front Plant Sci. 2019;10. doi: 10.3389/fpls.2019.00892 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Habig M, Schotanus K, Hufnagel K, Happel P, Stukenbrock EH. Ago1 affects the virulence of the fungal plant pathogen zymoseptoria tritici. Genes (Basel). 2021;12. doi: 10.3390/genes12071011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Wei K, Aldaimalani R, Mai D, Zinshteyn D, PRV S, Blumenstiel JP, et al. Rethinking the “gypsy” retrotransposon: A roadmap for community-driven reconsideration of problematic gene names. OSFpreprints. 2022;10.31219/o. [Google Scholar]
  • 68.Bleykasten-Grosshans C, Fabrizio R, Friedrich A, Schacherer J. Species-Wide Transposable Element Repertoires Retrace the Evolutionary History of the Saccharomyces cerevisiae Host. Mol Biol Evol. 2021;38: 4334–4345. doi: 10.1093/molbev/msab171 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Pereira D, Oggenfuss U, McDonald BA, Croll D. Population genomics of transposable element activation in the highly repressive genome of an agricultural pathogen. Microb Genom. 2021;7. doi: 10.1099/mgen.0.000540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Chen RS, McDonald BA. Sexual reproduction plays a major role in the genetic structure of populations of the fungus Mycosphaerella graminicola. Genetics. 1996;142: 1119–1127. doi: 10.1093/genetics/142.4.1119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Stukenbrock EH, Dutheil JY. Fine-Scale Recombination Maps of Fungal Plant. 2018;208: 1209–1229. doi: 10.1534/genetics.117.300502/-/DC1.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Plissonneau C, Stürchler A, Croll D. The Evolution of Orphan Regions in Genomes of a Fungal Pathogen of Wheat. mBio. 2016;7: 1–13. doi: 10.1128/mBio.01231-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Kahle D, Wickham H. ggmap: Spatial visualization with ggplot2. R Journal. 2013;5: 144–161. doi: 10.32614/rj-2013-014 [DOI] [Google Scholar]
  • 74.Smit A, Hubley R, Green P. RepeatMasker Open-4.0. http://www.repeatmasker.org.
  • 75.Xu Z, Wang H. LTR-FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35: 265–268. doi: 10.1093/nar/gkm286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Gao D, Li Y, Kim K Do, Abernathy B, Jackson SA. Landscape and evolutionary dynamics of terminal repeat retrotransposons in miniature in plant genomes. Genome Biol. 2016;17: 7. doi: 10.1186/s13059-015-0867-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Ma B, Li T, Xiang Z, He N. MnTEdb, a collective resource for mulberry transposable elements. Database. 2015;2015: 1–10. doi: 10.1093/database/bav004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Crescente JM, Zavallo D, Helguera M, Vanzetti LS. MITE Tracker: An accurate approach to identify miniature inverted-repeat transposable elements in large genomes. BMC Bioinformatics. 2018;19: 1–10. doi: 10.1186/s12859-018-2376-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Mao H, Wang H. SINE-scan: An efficient tool to discover short interspersed nuclear elements (SINEs) in large-scale genomic datasets. Bioinformatics. 2017;33: 743–745. doi: 10.1093/bioinformatics/btw718 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Wenke T, Dobel T, Sorensen TR, Junghans H, Weisshaar B, Schmidt T. Targeted Identification of Short Interspersed Nuclear Element Families Shows Their Widespread Existence and Extreme Heterogeneity in Plant Genomes. the Plant Cell Online. 2011;23: 3117–3128. doi: 10.1105/tpc.111.088682 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Breen J, Li D, Dunn DS, Békés F, Kong X, Zhang J, et al. Wheat beta-expansin (EXPB11) genes: Identification of the expressed gene on chromosome 3BS carrying a pollen allergen domain. BMC Plant Biol. 2010;10. doi: 10.1186/1471-2229-10-99 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Feurtey A, Lorrain C, Croll D, Eschenbrenner C, Freitag M, Habig M, et al. Genome compartmentalization predates species divergence in the plant pathogen genus Zymoseptoria. BMC Genomics. 2020;21: 588. doi: 10.1186/s12864-020-06871-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 2078–2079. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215: 403–410. doi: 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 85.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2014;12: 59–60. doi: 10.1038/nmeth.3176 [DOI] [PubMed] [Google Scholar]
  • 86.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;30: 772–780. doi: 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Hane JK, Oliver RP. RIPCAL: A tool for alignment-based analysis of repeat-induced point mutations in fungal genomic sequences. BMC Bioinformatics. 2008;9: 1–12. doi: 10.1186/1471-2105-9-478 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Hane JK, Oliver RP. In silico reversal of repeat-induced point mutation (RIP) identifies the origins of repeat families and uncovers obscured duplicated genes. BMC Genomics. 2010;11. doi: 10.1186/1471-2164-11-655 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Paradis E. Pegas: An R package for population genetics with an integrated-modular approach. Bioinformatics. 2010;26: 419–420. doi: 10.1093/bioinformatics/btp696 [DOI] [PubMed] [Google Scholar]
  • 90.Paradis E, Schliep K. Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35: 526–528. doi: 10.1093/bioinformatics/bty633 [DOI] [PubMed] [Google Scholar]
  • 91.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2020. [Google Scholar]
  • 92.Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980;16: 111–120. doi: 10.1007/BF01731581 [DOI] [PubMed] [Google Scholar]
  • 93.Quinlan AR, Hall IM. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26: 841–842. doi: 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Grandaubert J, Bhattacharyya A, Stukenbrock EH. RNA-seq-Based Gene Annotation and Comparative Genomics of Four Fungal Grass Pathogens in the Genus Zymoseptoria Identify Novel Orphan Genes and Species-Specific Invasions of Transposable Elements. G3-Genes Genomes Genetics. 2015;5: 1323–1333. doi: 10.1534/g3.115.017731 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Rice P, Longden L, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics. 2000;16: 276–277. doi: 10.1016/s0168-9525(00)02024-2 [DOI] [PubMed] [Google Scholar]
  • 96.Testa AC, Oliver RP, Hane JK. OcculterCut: A comprehensive survey of at-rich regions in fungal genomes. Genome Biol Evol. 2016;8: 2044–2064. doi: 10.1093/gbe/evw121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.van Wyk S, Harrison CH, Wingfield BD, De Vos L, van der Merwe NA, Steenkamp ET. The RIPper, a web-based tool for genome-wide quantification of Repeat-Induced Point (RIP) mutations. PeerJ. 2019;7: e7447. doi: 10.7717/peerj.7447 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25: 1972–1973. doi: 10.1093/bioinformatics/btp348 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Charif D, Lobry JR. Seqin{R} 1.0–2: a contributed package to the {R} project for statistical computing devoted to biological sequences retrieval and analysis. In: Bastolla U, Porto M, Roman HE, Vendruscolo M, editors. Structural approaches to sequence evolution: Molecules, networks, populations. New York: Springer Verlag; 2007. pp. 207–232. [Google Scholar]
  • 100.Wickham H. ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016. [Google Scholar]
  • 101.Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17: 540–552. doi: 10.1093/oxfordjournals.molbev.a026334 [DOI] [PubMed] [Google Scholar]
  • 102.Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30: 1312–1313. doi: 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Wang LG, Lam TTY, Xu S, Dai Z, Zhou L, Feng T, et al. Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data. Mol Biol Evol. 2020;37: 599–603. doi: 10.1093/molbev/msz240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Müller K, Wickham H. tibble: Simple Data Frames. R package version 3.0.1. 2020. [Google Scholar]
  • 105.Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the Tidyverse. J Open Source Softw. 2019;4: 1686. doi: 10.21105/joss.01686 [DOI] [Google Scholar]
  • 106.Wickham H, François R, Henry L, Müller K. dplyr: A Grammar of Data Manipulation. R package version 1.0.0. 2020. [Google Scholar]
  • 107.Revell LJ. phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol Evol. 2012;3: 217–223. doi: 10.1111/j.2041-210X.2011.00169.x [DOI] [Google Scholar]
  • 108.Yu G, Smith DK, Zhu H, Guan Y, Lam TTY. Ggtree: an R Package for Visualization and Annotation of Phylogenetic Trees With Their Covariates and Other Associated Data. Methods Ecol Evol. 2017;8: 28–36. doi: 10.1111/2041-210X.12628 [DOI] [Google Scholar]
  • 109.Smith MR. TreeTools: create, modify and analyse phylogenetic trees. https://cran.r-project.org/web/packages/TreeTools/index.html. 2019. doi: 10.5281/zenodo.3522725 [DOI] [Google Scholar]
  • 110.Collins C, Didelot X. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. McHardy AC, editor. PLoS Comput Biol. 2018;14: e1005958. doi: 10.1371/journal.pcbi.1005958 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Bart Thomma

26 Sep 2022

Dear Prof Croll,

Thank you very much for submitting your manuscript "Recent transposable element bursts triggered by insertions near genes in a fungal plant pathogen" for consideration at PLOS Pathogens. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. First of all, I would like to apologise for the lengthy review process. However, I had difficulties finding appropriate reviewers at the start, and later on right after, the holiday season. But eventually even four reviewers agreed that are quite unanimous in their assessment. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Bart Thomma

Section Editor

PLOS Pathogens

Bart Thomma

Section Editor

PLOS Pathogens

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

***********************

Reviewer's Responses to Questions

Part I - Summary

Please use this section to discuss strengths/weaknesses of study, novelty/significance, general execution and scholarship.

Reviewer #1: In many plant pathogenic fungi as well as other organisms, transposable elements (TEs) exist abundant in genomes and contribute to their evolution. In these fungi, however, little is known about how bursts of TE copies have occurred, due to the insufficiency of high-quality genome assemblies. This paper characterized TE burst events of a wheat pathogen Zymoseptoria tritici using complete (telomere-to-telomere) genome sequences of 19 isolates. By a phylogenetic approach, the authors divided TEs with higher copy numbers into the copies derived from burst events and the others. The analysis revealed that certain TE families experienced the bursts one or several times and that TEs in the burst clade tended to be located closer to genes. This paper showed the association between the TE bursts and the genomic environments at each TE family level. This paper along with the other accumulated evolutionary papers of this fungus can contribute to the understanding of how the pathogen evolution has been driven, which can give a certain impact on related research fields of other plant pathogens. However, there are fatal data labeling errors and untransparent descriptions, which make it harder to judge whether some of the authors’ interpretations are correct or not, as listed below. These data should be reanalyzed and reinterpreted carefully. Even though the mistakes, the overall approaches toward the scientific question seem reasonable so the manuscript will be fine as long as the errored data and accompanied interpretations will be corrected without any contradiction.

Reviewer #2: I think that the paper summarize a good work.

The authors uses 19 isolates to describe the history of TE evolution in different Z. tritici. They characterize all TEs in all 19 isolates, they find specific characteristics of each TE family and using two of them, they try to describe burst of expansion. It is known that genome evolution is shaped by TE burst and therefore, they try to connect TE burst to evolution of individual strains isolated in different part of the world.

although the paper is readable, in some parts there are many assumptions and in the last paragraph of the results, there is little indication where to find the interesting results in the figures. Ultimately, the paper results difficult to read for people that are not in the field of TE evolution

Reviewer #3: The manuscript “Recent transposable element bursts triggered by insertions near genes in a fungal plant pathogen” by Oggenfuss and Croll is a fascinating deep dive into the transposable element dynamics of the fungal pathogen Zymoseptoria tritici. Taking advantage of a recent population genomic dataset of 19 strains, the authors take a unique approach of evaluating the genomic niche that different TEs occupy within the genomes. They determine what properties of these niches correspond with given TE superfamilies and uncover interesting patterns with regard to which elements are currently undergoing expansions within the population. The work is very methodical and encompassing, however I feel that a lot of improvement should be made before a final version is ready for a broader scientific audience. In particular, I believe that the current draft will be very difficult to understand for readers that are not experts on transposable elements. I hope that the following comments will help the authors rework their manuscript.

Reviewer #4: I have now read and considered the manuscript entitled Recent transposable element bursts triggered near genes in a fungal plant pathogen

Overall this is a well written manuscript that leverages 19-complete telomere to telomere assemblies to examine the evolution of transposon families that have undergone recent ‘bursts’ within Zymoseptoria tritici. The authors have uncovered some interesting patterns, for example that small non-autonomous transposons (mites) are more likely to escape genome defence mechanisms and reside near genes. The authors also explore in more detail two very large transposons that have undergone recent expansion (Cassini, Deimos). I do not find any flaws in the analyses or data presented, however I found the amount of information presented in the figures overwhelming, which made it difficult to distil out the main messages that the authors wish to communicate. I realise that this is likely a shortened revision of a previous manuscript based on my observation that the text references 5 figures but there are seven figures in the material. Also, there are more panels listed in the figure legends than there are in the figures. So I suspect that I am asking the authors to even further simplify a longer version of a prior draft manuscript.

Below are some comments/suggestions that I hope will improve clarity and help the authors better communicate their main findings.

**********

Part II – Major Issues: Key Experiments Required for Acceptance

Please use this section to detail the key new experiments or modifications of existing experiments that should be absolutely required to validate study conclusions.

Generally, there should be no more than 3 such required experiments or major modifications for a "Major Revision" recommendation. If more than 3 experiments are necessary to validate the study conclusions, then you are encouraged to recommend "Reject".

Reviewer #1: 1. There are several fatal errors in Figure 3.

1.1. Some TE families seem mislabeled in Figure 3A-E, which leads to the contradiction with Figure 3F and texts. For example, the description in lines 177-178 contradicts Figure 3B. The median for DTX_MITE_Goblin and RLC_Deimos (the fourth or the fifth from the left) seems below 40% or over 60%, respectively, in the figure, against the text description. Moreover, the values shown in Figure 3B-E do not correspond to that shown in Figure 3F. It can be easily understood from the plots of DTX_MITE_Goblin [the highest RIP-like mutation in (E)], RLC_Deimos [shorter length in (C)], and RLG_Uranus [the highest nucleotide diversity in (D)], for instance.

1.2. It seems strange that the scale for “RIP-like mutation” in Figure 3E and F is inconsistent, without any explanation.

1.3. The explanation for each panel in the legend is off by one or two, probably because two panels [former (A) and (G)] were moved to Supplementary Figure S2A. Please correct the alphabet number in the Figure 3 legend.

There are additional suggestions for Figure 3 as follows.

1.4. Please add the explanation to the legend for what the red horizontal line in Figure 3B and C indicates. In line 176, please describe what percentage is “the genome-wide GC content”. Does the red horizontal line in Figure 3B indicate that?

1.5. In Figure 3D, can the nucleotide diversity be shown by a logarithmic scale? Otherwise, it is hard to understand that “MITEs tend to have higher nucleotide diversity” as described in line 184.

1.6. If the box plots and/or bar graphs in Figure 3A-E will be colored distinguishable for each superfamily, it will be easier to catch some trends discussed.

2. There are also a suggestion and errors in Figure 4.

2.1. Why DTX_MITE_Goblin was excluded from the analysis in Figure 4A?

2.2. In Figure 4D, what the colors indicate is inconsistent between the illustrated or text legends. The red and blue colors respectively show “no burst” or “burst” on the figure, but opposite in the text legend in lines 760-761.

2.3. The discussion in line 261 is inconsistent with the result in Figure 4D. The “lower” would be “higher”.

3. Some results are described without showing data, exemplified as follows. It is desirable to visualize or summarize the data in additional figures or tables.

3.1. The description in line 155, “TE insertions are at higher density on accessory chromosomes (Figure 2B)”, is not supported by Figure 2B which shows “number of copies” but not “density”.

3.2. As for the description in lines 217-218 (“Overall, around 50% ... several bursts”), could it be shown which TE families experienced how many times bursts in how many genomes out of 19?

3.3. Could the data for a discussion in lines 322-324 (“We found ... most analyzed genomes).”) be shown?

4. Please add a clear explanation for the difference between “RIP index” and “RIP-like mutations” analyzed in Figures 2F, 4B and D.

5. The trend that burst TE copies tended to be located near genes (Figure 4D) is not observed in some TE families (e.g. RII_Cassini and DTX_MITEs_Goblin) (Figure 5). Please add the discussion about that.

6. It seems better to change the paper title from “… triggered by …” to “… associated with …” or other such milder expression, since this paper shows no direct evidence for the causal relationship expressed by the current title. Lines 130-131 (“The comprehensive … number expansions.”) and 263-264 (“Our analyses ... coding sequences.”) also seem to be over-discussed.

Reviewer #2: Line 140: positive correlation needs to be assessed with statistical means.

Line 147: what do you mean with the sentence “Only few TE insertions are fixed among genomes (n = 122; Figure 1D) and consist predominantly of MITEs”. Reading the paragraph, I understand that it refers to a specific family present in multiple isolate or only in one but reading the figure legend is more about location rather than presence/absence of a family:” Allele frequencies of TEs at orthologous insertion loci among genomes”. Can you explain better in the text?

Line 150: I understand what the author did but I think that it should be explained better. I can understand that the author think that all readers need to know which marker are associated with euchromatin and which with heterochromatin but at the moment I think that this is not a common knowledge therefore I think that you need to explain which marker goes with either heterochromatin or euchromatin

Line 155: “, but TE insertions are at higher density on accessory chromosomes”. Honestly, I can not see where this is reported in fig 2B. In contrast I see that that bar in figure 2B regarding the panel chromosome has a higher number of TE in the core genome. I think that the authors should find a different way to plot the density in function of core/accessory chromosome size

Line 184: “MITEs tend to have higher nucleotide diversity at 185 similar copy numbers compared to other TE families (Figure 3D, F)”. I find difficult to see this. In figure 3D there are many MITs elements but only one has an high Nucleotide diversity: DTX_MITE_Toll. The rest is normal.

Line 185: looking at RIP, on MITE_Goblin has an high value of RIP compared to others.

Line 193: “Our findings show that recent bursts are characterized by high copy numbers, but genomic defences and the length of TEs create complex outcomes for individual TE families.” Honestly, I do not understand where this come from and what this means

The paragraph entitled “NICHE CHARACTERISTICS OF TRANSPOSABLE ELEMENT ACTIVATION” is very difficult to read. The paragraph is short and refers to fig 5 and 6 where for each figure A-E panels. The authors describe their findings but do not refer in an appropriate way where to find the information that support their statements. This paragraph should be written better and explained better the results

Reviewer #3: I have only one major concern with the analyses, which involves RIP. Given that RIP is a stochastic process, some of the underlying assumptions that the authors appear to have made may be violated. Firstly, RIP becomes inefficient for repeat lengths below ~ 500 bp (see https://doi.org/10.1038/ncomms4509), thus the smaller TEs like MITES are likely undergoing fundamentally different evolutionary processes. This could cause a number of issues when lumping all of the analyses together. For example, using the same branch length cut offs for both the MITEs and LTRs could be underestimating the number and nature of burst events. Second, it seems that the authors have come to some concrete conclusions about precisely which element is the ‘master’ copy of burst events. It is not clear from the current draft exactly how this was done, but both the nature of RIP, and the fact that the MITEs are non-autonomous, should make this very difficult if not impossible. Lastly, as the authors bring up in the discussion, some strains have lost RIP or components of it, which would mean that any TE dynamics in those strains would be fundamentally different from the others. It is not mentioned which strains have lost it, and as the results never break down the dynamics by strain, one cannot currently evaluate this. Some of these points could directly impact the main conclusion, that bursts are driven by insertions near genes. For example, is the reported copia element burst in a strain without RIP? Is it then true that bursts originate from gene rich regions, or does that strain have a unique distribution of TEs because it has lost its genome defense machinery? Likewise, although the MITEs are in gene rich regions, where are the protein coding elements? Isn’t that the more important question with regard to what’s driving the expansions?

In addition to this issue, the introduction could be improved. Large parts currently read as a list of interesting facts that are only loosely tied together, and are sometimes repetitive. e.g. Line 54 talks about silencing of TEs, and the same information is presented again on Line 120. Other important information is either hidden in the supplement (like what the codes RLC, RLG etc. mean, or even definitions for Copia and Gypsy), or not presented until the discussion, such as much of the information about RIP. Furthermore, the start of the results section seems to confound analyses from the previous study (Badet et al., 2020) with what is done here. e.g. Line 140: "show a positive correlation", this does not appear evident from Figure 1, but is clear from Badet et al., 2020 figure 5A. The authors should be clear about what information has been presented again for the sake of this study and what are new analyses.

Reviewer #4: The authors should include a better introduction to transposon families, in particular the three families that they focus on throughout the results (MITEs, Copia, Gypsy).

Introduction: For non-specialist readers it may be helpful to briefly introduce Class I vs. Class II TEs. Many of the TE names listed in the Manuscript use the Wicker classification abbreviation (RLX, RLC,RLG) but this is never defined. May be enough to define Copia/Gypsy, LINEs and MITEs? I also note a new name for the TE family Gypsy has been proposed, to avoid some of the negative history associated with this name, I wonder if the authors would consider adopting this new name. Lines 96-101 may be a good place to revise and consider adding this information in, the current text here describes the loss of Long-terminal repeats will probably not make much sense to someone who is not familiar with TE families that contain these features.

The loss of dim2 in some populations of Z. tritici is an important point that is not very clearly linked to the results where two individual genomes contain significant TE bursts. Is the Australian isolate one of these genomes? More clearly signposting this in the results (or even in Fig1?) will improve clarity as to why TE bursts are more likely to be observed in some isolates.

I have struggled with the ancestral reconstruction analysis and phylogenetic trees in the context of RIP. It was not clear to me from the methods if the AR was conducted on “de-RIPed” alignments? In my experience de-RIPing or removing sites in an alignment with RIP-like mutations increases pairwise sequence identity of alignments dramatically. I have concerns that the clades with high RIP mutations in figures 6-7 are grouping mainly on AT richness (text figure 5?), would trees constructed with de-ripped sequences give a better representation of the relationships within this TE family? I can see clearly for Deimos that the burst clade has very low RIP. This is less obvious for Cassini because the bursts are much smaller and a little nested within the tree. I did not understand the dot plots in all parts of Figure 6-7 and the statistical tests associated with these. More clarity around this analysis may address this question I have about ancestor reconstruction and the link to RIP-like mutations but these figures are not discussed in the current text.

All figures: Please consider reducing the number of panels. What are your main messages, can you remove “GC fragment” or “Large RIP affected Regions” etc etc? Do these different categories show very different things? If yes, then this should be more clearly discussed in the text for each figure, where this data remains.

Figure 1: Parts A and B These are re-purposed figures from previous work, do you need to re-present both here, or is part B sufficient to show genome size is correlated with transposon copy number? Is the date of sampling important for isolates with TE bursts? Or perhaps indicating which isolates have a functional dim2 gene and therefore are capable of RIP?

Fig1D: Allele frequencies? Allele counts? Frequency is between 0-1? May consider moving text “insertions were considered orthologous…”to main results text rather than the legend

Figure 2: Please consider moving the Legend or making it more visible. Currently it is embedded in the middle of part C and not clear that it is the color legend for other parts of the figure too.

Figure 4D, Figure 5: are all these panels needed?

Figures 6-7: What is part D “GC content of the copy”?

Could the authors please provide the fasta files used to construct phylogenetic trees? In their public datasets on github (if not too large) or Zenodo database?

**********

Part III – Minor Issues: Editorial and Data Presentation Modifications

Please use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity.

Reviewer #1: INTRODUCTION

Line 54: Please remove “and chromatin”, redundant with “histone modifications” in lines 53-54.

Line 110: Better to add “(Mycosphaerella graminicola)” after “Zymoseptoria tritici”.

Line 119: Please explain “MgDNMT” briefly (e.g. a DNA methyltransferase).

Line 131: “Escape of” → “Escape from”

RESULTS

Line 151: “5kb” → “5 kb”

Line 159: Please mention which TE category in Figure 2 (“MITE”, “Copia/Gypsy”, “LINE” or “other”) corresponds to “RLC” and “RLG”.

Lines 212-213: The percentage “20.7%” is not “more than half”.

Lines 236: The adverb “Similarly” is confusing since the traits mentioned in this sentence (LRAR and closest gene of burst copies) seem opposite between RII_Cassini and RLC_Deimos.

DISCUSSION

Line 335: “our a” → “our” or “a”

METHODS

Line 362: Delete “(Supplementary Figure S4)”.

Lines 396: “TheRIPper” → “The RIPper”

Line 405: Insert “(Supplementary Figure S4)” instead of it deleted from line 362.

REFERENCES

Lines 488-489: Please add the journal and volume.

Line 547: zymoseptoria → Zymoseptoria

Lines 582-584: The paper is now published (doi: 10.1093/g3journal/jkab068). Please cite the peer-reviewed one.

Lines 601-603: Please add the journal and volume.

Lines 654 and 655: Please cite the URL.

FIGURES

Figure 1D: Why is the font size of “9” on the x-axis larger?

Figure 2A: Please describe what the x-axis represents.

Figure 2B legend: Please add “(LRAR)” after “large RIP affected regions”.

Figure 2F: Could the statistical significance be shown? (e.g. Put an asterisk to the correlation coefficients supported with p < 0.05)

Figure 5: Please explain “Mutations per bp” in the legend briefly. Is the mutation rate based on the consensus sequence for each family?

Supplementary Figures

Figure S3 title: “DTX_MITE_Gobblin” → “DTX_MITE_Goblin”

Reviewer #2: The authors should point better to the panels in each figure to help the reader to follow what they claim

Reviewer #3: Line 35: Is this only purifying selection or could there be other insertion preferences etc?

Line 43: a genomic environments or genomic environment.

Line 49: define abbreviation (TEs)

Line 51: "the potential of deleterious insertions into coding and regulatory regions", confusing wording, consider rephrasing.

Line 59: "in all copies". RIP can target all copies, but does not necessarily hit every copy due to its stochastic nature.

Line 85: explain positive effect.

Line 96: RIP will also create long branches

Line 111: co-evolved in what way? Is it an obligate pathogen in wheat? Is it specific to cultivated varieties?

Line 113: comma after reference

Line 141: In Supplementary table 1 there are several isolates with a lower TE-content than 18.1%.

Figure 1 seems to reproduce some information from the Badet et al., 2020 paper, such as the map. Is this necessary?

Figure 1A: Difficult to see differences in TE content in figure 1A, scale only goes between 17-24 (very small). Australia has no estimated TE-content, or is it the most?

Figure 1B: Figure text on x-axis is tiny.

Figure: 1C: What about non-MITE DNA transposons, are they all “other”? Both figure and text make it unclear what the abundance of these elements are. With this number of MITEs there must be some master copies, which are hijacked by the MITEs. They are likely <20 copies.

Figure 1C: “Between the same set of orthologous genes”, what if the genes are on the border of a large TE-island? Is this comparable to small insertion sites?

Figure 1 C, a lot of windows are 100% TE. Presumably this is due to nesting. Does that affect the results?

Table S2 contains a massive amount of information. This is good in terms of reproducibility and open science. However it makes it difficult to find the information pointed to as in text references. e.g. Line 145. Perhaps a more readable table with a summary of important information, like what superfamilies given families are in, is needed.

Line 145-146: Neither figure 1C nor supplementary figure 1 are showing good comparisons of the number of families belonging to superfamilies in the different genomes.

Line 151: why 5 kb? Does “centered around” include the TE itself, or only the flanks? If the former then many of the retrotransposons may be larger than 10 kb and would have skewed values due to this.

Line 156 and 159: Is the GC association only due to RIP?

Supplementary figure 1: All text is way to small

Figure 2A: Label X axis. Y axis is TE density? The colours should be stated here as well, this is an issue for many of the figures. I recommend either clearly having the legend in part A, or prominently at the side in each figure.

Figure 2B: The fact that there are more TEs in core likely reflects that there is more DNA in the core. Should show as a proportion. Same with telomere and LRAR.

I question whether setting a cutoff for low GC at < 50% is meaningful when the genomes contain 51% - 52% GC.

Figure 2 C: What are the weird peaks around 43% and 49%?

Line 168: "Longer TE copies tend to be located in already TE rich niches", is that length fragment? Throughout the text and figure 2 it is not clear what “fragment” refers to.

Supplementary figure 3A: are all clades bursts? The black bar on the outside is not that informative. The axes should be labeled clearer, not evident what's meant by copy and sequence.

Line 175: “We find that high-copy TE families tend to also have more variable copy numbers among 19 genomes, indicating ongoing activity of individual families”. Why does this indicate ongoing activity? Also, this is more of a discussion point.

Line 179: “with the shorter copies belonging to the non-autonomous MITEs” And what is the longest copy?

Line 182: RIP is not clock-like, so it cannot be considered “age” for ripped sequences.

Figure 3: I think the figure has labeling issues. It looks like the fourth column is DTX_MITE_Goblin, which has a consensus length of 226 bp, but the bar plot seems to show that it's over 5000 bp. Likewise the fifth column should be RLC_Deimos at 6356 bp, but the plot shows a very small size, currently unreadable. I didn't go through every column, but this should be checked.

Figure 3C: log scale would be better to show lengths as many look like 0. Perhaps for nucleotide diversity as well.

Figure 3F: The text in the plot is too small. It is also difficult to tell what TE Superfamilies the different data points correspond to (except for MITE).

Figure 3: There is no panel G or H? These sound like the same as supplementary figure S2A and S2B

Line 198: 'the factors'

Figure 4 B: I don't understand this plot at all

Line 200: What superfamilies do these belong to? It is quite difficult trying to figure out what type of elements these are.

Figure 5: The colours do not show up on many of the violin plots. Another way of labeling would help.

Figure 5: Colors in legend and figure text do not match. Legend: “Red not part of burst”. Text: “copies in burst (red)”

Figure 5: Confusing to have families that have both burst/non-burst and families that only have non-burst(?) in the same figure.

Figure 6: It looks like there may be a lot of bursts of intermediate age, does this affect the results? What do the dot plots show? It’s not clear what the x axis refers to.

Figure 6 - 7: Do trees have support?

Line 260: 'of TEs'

Line 384: While Kimura distances can be calculated on RIPped sequences, they mean something very different from how Kimura distance is normally discussed in TE literature i.e. these are not mutating at constant rates and don’t refer to age.

Reviewer #4: Minor comments:

Lines 78-79: Preferred insertion site of TEs, is it worth mentioning here that many of these are “TA” sequences, which is also linked with RIP?

Lines 151-152: is TE “insertion” misleading? You analysed 5kb windows around each annotated TE? Or you only analysed TE’s where you clearly had an empty site in another genome? (possible as very few TEs fixed)

Line 173-174: RLC and RLG not defined, Focus of figures 1 and 2 are mainly of these families and MITEs, might be worth clarifying early in results why you focus on these families and define them.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Aaron Vogan

Reviewer #4: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here on PLOS Biology: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Decision Letter 1

Bart Thomma

13 Jan 2023

Dear Prof Croll,

Thank you very much for submitting your manuscript "Recent transposable element bursts are associated with the proximity to genes in a fungal plant pathogen" for consideration at PLOS Pathogens. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Bart Thomma

Section Editor

PLOS Pathogens

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

***********************

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Part I - Summary

Please use this section to discuss strengths/weaknesses of study, novelty/significance, general execution and scholarship.

Reviewer #1: The manuscript has been improved after the first review process. The discussion becomes deeper, which attracts interest in the relationship of TE bursts with the insertion into specific gene niches and the activity of a genomic defense mechanism (RIP). The finding of the variation of possible burst mechanisms among TE families would be also interesting. The paper can provide a subset of arranged information for TE science in plant pathogens. I suggest modifying some minor issues listed follows.

Reviewer #3: The revised version of the manuscript is a significant improvement on the original. The authors have done an excellent job addressing all of the reviewers comments. The clarity has increased dramatically, making for a pleasurable read. I think the manuscript is more or less ready for final publication in its current form, though I have noted some minor errors and have a couple suggestions listed below. However there is one point that I highly recommend the authors look over closely. In Figure 6 D it appears the burst copies of RLC_Deimos are mostly inserted into genes. I would double check these to make sure that the TE ORFs aren't miscalled as genes. If it's not an error this fact could be worth highlighting more in the text.

Additionally, I am aware of the debate regarding the family name "gypsy". I highly recommend using Ty3 as the name for the family, or at the very least Ty3/mdg4, especially as these are fungal TEs. In any case please check the consistency as sometimes RLG is used and sometimes mdg4 through out the manuscript.

Reviewer #4: (No Response)

**********

Part II – Major Issues: Key Experiments Required for Acceptance

Please use this section to detail the key new experiments or modifications of existing experiments that should be absolutely required to validate study conclusions.

Generally, there should be no more than 3 such required experiments or major modifications for a "Major Revision" recommendation. If more than 3 experiments are necessary to validate the study conclusions, then you are encouraged to recommend "Reject".

Reviewer #1: The major issues pointed at the first review process seems to be solved and I did not find additional major issues.

Reviewer #3: (No Response)

Reviewer #4: (No Response)

**********

Part III – Minor Issues: Editorial and Data Presentation Modifications

Please use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity.

Reviewer #1: Line 142: Delete "The loss of".

Line 211: RLC_Deimos seems an extreme case of "The GC content of high copy TE families" "lower than 50%" rather than "the exception".

Line 225: There is no "Figure 4H".

Line 246: The percentage of "a large part of TE copies" that "remain in large RIP-affected regions" is missing.

Lines 268-270: The description about RIP seems redundant within the sentence.

Lines 275-276: RII_Cassini seems to show a "similar pattern of temporal escape

from RIP facilitating a burst" like RLC_Deimos though with a smaller degree.

Line 663: Correct "xx".

Line 709: 67-U139 → 67-77

line 711: Bmc → BMC

Fig 1B: Why Copia/mdg4 instead of RLC/RLG?

Fig 7E, 8E and S3E: The values of the scale seems inconsistent between the pylogenetic tree and the correration plot. If the scale for the correlation plot indicates log10 value, please mention that.

Reviewer #3: Line 120: "high number of low highly similar" -> confusing language, consider rephrasing.

Line 137 : "Has both produced both" -> remove one both; missing an "and"

Line 189: "is" -> are

Figure 1: details of map construction should be in methods

Figure 2: the LRAR density pattern is surprising, is it labelled properly?

Line 197: "LINES" no s

Figure 4: Much improved, great work!

Line 303: "General" -> generally

Line 316: "play also" -> also play

Line 402: One reason why copies near genes may not be targeted is due to RIP spill over, where adjacent sequence is mutated along with the targeted duplicate region. This could be mentioned here to explain the "secure niche".

Supplemental figure 5: "Netherland" -> Dutch; error with "accessory chromosome" labels.

Supplemental table 1: are the underscores in the "isolate" column intentional?

Reviewer #4: This is a substantially revised manuscript that addresses most of the issues raised in my previous review.

One outstanding area that I feel could be explained more clearly is the ancestral reconstruction dotplots shown in figures 7 and 8. I still struggle to fully understand this analysis and feel that some additional explanatory text in the results here may help clarify, or in the methods lines 523-529. For example the phrase “We imported the trees into R using read.tree..” Which tree did you import as the starting point for this analysis the highest scoring ML tree? OR “We added metadata using ddplyr…” Which I assume means you assigned each sequence in your tree (each tip) the data associated with that particular TE copy. Would the authors like to state more clearly what the significance of a strong correlation or lack there of means? (i.e. strong correlation indicates not a lot of change through time? Or the inverse?)

Other minor typos spotted throughout the text:

Line 71: tough � though

Line 142: “the loss of cytosine methylation was lost..”

Line 160: “Escape from genomic defenses including silencing is a likely the major driver of TE dynamics.” This sentence stuck out to me because silencing was not really measured in this study.

Lines 174-176: “Half (n=106) … half to retrotransposons (n-59)”. Does not seem quite correct to use “half”.

Line 189-190: “We found no overall association between TE copies and large RIP-affected regions.” I think this comes with a small caveat that LRAR can completely destroy some TE copies making them impossible to detect.

Line 324: insert “by” …were strongly affected BY RIP prior to the bursts.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: Yes: Aaron A. Vogan

Reviewer #4: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Decision Letter 2

Bart Thomma

18 Jan 2023

Dear Prof Croll,

We are pleased to inform you that your manuscript 'Recent transposable element bursts are associated with the proximity to genes in a fungal plant pathogen' has been provisionally accepted for publication in PLOS Pathogens.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Pathogens.

Best regards,

Bart Thomma

Section Editor

PLOS Pathogens

Bart Thomma

Section Editor

PLOS Pathogens

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

***********************************************************

Reviewer Comments (if any, and for reference):

Acceptance letter

Bart Thomma

7 Feb 2023

Dear Prof Croll,

We are delighted to inform you that your manuscript, "Recent transposable element bursts are associated with the proximity to genes in a fungal plant pathogen," has been formally accepted for publication in PLOS Pathogens.

We have now passed your article onto the PLOS Production Department who will complete the rest of the pre-publication process. All authors will receive a confirmation email upon publication.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Pearls, Reviews, Opinions, etc...) are generated on a different schedule and may not be made available as quickly.

Soon after your final files are uploaded, the early version of your manuscript, if you opted to have an early version of your article, will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Pathogens.

Best regards,

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Hierarchy TE superfamilies: Classes, subclasses, orders, superfamilies as well as the tree-letter code according to Wicker et al (2007) [45].

    The Z. tritici specific family names are according to Badet et al (2020) [52].

    (PDF)

    S2 Fig. Characteristics of high-copy TE families: TE families are ordered from the highest copy numbers to lowest copy numbers (right) in all 19 analyzed genomes combined.

    (A) Total copy numbers. (B) Long (> 0.00001; red) and short (≤0.00001; blue) terminal branch lengths of individual copies characterizing two classes of divergence times.

    (PDF)

    S3 Fig. Phylogenetic tree of the TE family DTX_MITE_Gobblin: (A) Phylogenetic tree with colors indicating the number of RIP-like mutations.

    The black bar marks the different burst clades. The dot plot shows the changes in RIP-like mutations from the ancestor to offspring for all internal and terminal branches from the ancestral state reconstruction. (B-E) Phylogenetic trees and ancestor-offspring changes for (B) the GC content of the niche, (C) the overlap of the niche with large RIP affected regions, (D) the GC content of the copy and (E) the distance to the closest gene.

    (PDF)

    S4 Fig. TE copy frequency.

    Comparison of TE copy frequency between outgroups of burst and all other TE copies.

    (PDF)

    S5 Fig. Genomic environment of the Z. tritici reference genome IPO323.

    Circos plot showing the genomic environment of the reference genome IPO323 (Dutch isolate). Description from outside to inside contains the GC content, gene content and TE content in windows of 10kb, the presence of large RIP affected regions (LRAR), and the indication of the histone marks H3K4, H3K27 and H3K9. Chromosomes 1–13 are core chromosomes that are present in each isolate, while chromosomes 14–21 are accessory chromosomes, that are not shared among all isolates.

    (PDF)

    S6 Fig. Procedure to obtain multiple sequence alignments among copies of TE families.

    Due to the high number of nested insertions and partially deleted fragments, we aligned only coding regions.

    (PDF)

    S1 Table. Features of the global pangenome of Zymoseptoria tritici used for transposable element analyses.

    (XLSX)

    S2 Table. Description of all TE copies analyzed: assigned TE family, information about isolate of origin, position in the genome, niche and TE sequence characteristics.

    (XLSX)

    Attachment

    Submitted filename: Response_reviewers.docx

    Attachment

    Submitted filename: Response_reviewers.docx

    Data Availability Statement

    The genome assembly and annotation for genome assemblies are available at the European Nucleotide Archive (http://www.ebi.ac.uk/ena) under the BioProject PRJEB33986. TE consensus sequences, raw sequences and phylogenetic trees are available on Zenodo (https://zenodo.org/record/7344421).


    Articles from PLOS Pathogens are provided here courtesy of PLOS

    RESOURCES