Abstract
Genome size expansions are common among eukaryotic lineages. Enlarged genomes can be bioenergetically demanding, and active mobile elements can trigger chromosomal rearrangements and loss of gene function. What triggers genome size expansions remains largely unexplored in many biological clades, particularly within the fungal kingdom. Activation of large transposable elements (TEs), such as long-terminal repeats (LTRs), is a common contributor. Yet the mechanisms of LTR activation remain poorly understood. Here, we focus on the fungal genus Pseudocercospora and closely related species with known variation in genome size. In using an assembly-free approach, we found that TE content is highly variable among species, with species-specific retrotransposon families being the main drivers of independent genome expansions. We further focused on the two species with the most expanded genomes and reference-quality genomes, P. fijiensis and P. ulei. We found that the P. ulei genome is compartmentalized, with highly variable TE densities among chromosomal regions, and a striking reduction in pathogenicity-associated genes. Overall, our study indicates that species of Pseudocercospora originally had reduced genome sizes, and genome expansions are species-specific, driven by heterogeneous sets of TE families. We discuss what might have caused TE activation and subsequent proliferation in the genus, including stress conditions and host adaptation. Surveys of clades with highly dynamic genome sizes are crucial for the investigation of causal factors driving long-term TE dynamics.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13100-026-00396-x.
Introduction
Eukaryotic species have very diverse genome sizes, covering > 10,000-fold changes for haploid genome sizes[1, 2]. Genome size expansions occurred widely among eukaryotic lineages and may be linked to speciation [3, 4]. Genome size expansions mostly arise from duplication of existing sequences and are thought to have minimal impact on functional complexity, including the number of gene families or regulatory sequences [5–7]. Duplication of regions range in size from whole genome duplications to aneuploidy, to large structural variation and the duplication of genes or transposable elements (TEs) [1, 8–12]. TEs are mobile genetic units with the ability to translocate or create copies of themselves and subsequently insert into different genomic regions [13, 14]. TEs are not monophyletic in their origin, and either transpose via RNA intermediates, coding their own reverse transcriptase and creating a copy, or via excision and insertion cycles [15, 16]. TEs are generally autonomous, and contain all genes needed for their own transposition. However, non-autonomous TEs can transpose by parasitizing autonomous TEs [17]. TEs are relatively small, ranging from a few dozen base pairs (bp) in certain non-autonomous elements to several tens of kilobases (kb) for typical retrotransposons, to 700 kb for more complex and clustered TE constructs like Starships [15, 18]. Individual TE insertions do not increase genome size substantially. However, their ongoing proliferation led, for example, the ~ 300 bp Alu element to cover almost half of the human genome, Miniature Inverted-repeat Transposable elements (MITEs) proliferation to be responsible for remarkable genome size differences among rice genomes, and retrotransposons to make up around 60% of the genome of the fungus Cenococcum geophilum [19–21]. Different stress conditions can be a factor to induce bursts of TE activity, leading to increased copy numbers over short evolutionary timescales [22, 23]. Furthermore, even silenced or non-functional TEs can impact genome size evolution, by inducing large-scale chromosomal rearrangements, duplications or deletions via ectopic recombination [24–26].
At the species and population level, TEs are a mutational force. Most new TE insertions are likely deleterious and under strong purifying selection and hence remain typically undetectable in populations [27]. Consequently, new TE insertions are selected against, and diverse mechanisms exist to suppress TE activity. Eukaryotes use epigenic silencing to prevent TEs from becoming active [28]. Epigenetic silencing includes DNA methylation and histone modifications [29, 30]. Silencing can be reversible, especially under stress conditions [22]. In addition, many ascomycete fungi have a defense mechanism against repeats called repeat-induced point mutation (RIP), which induces increased mutation rates in repeats [31]. However, individual TE insertion can also have beneficial impacts. For instance, in the wheat pathogen Zymoseptoria tritici, a TE insertion upstream of the Zmr1 promoter regulated the diversity of melanin accumulation [32]. TE-derived chromosomal rearrangements in industrial and sea floor adapted strains of Penicillium chrysogenum likely increased penicillin production [33, 34]. Over longer evolutionary time frames, TE derived genes can also become co-opted by integrating into host gene functions [35, 36].
The fungal genus Pseudocercospora contains over 300 species [37] predominantly consisting of host-specific fungal pathogens that pose a threat to agricultural and natural ecosystems [38]. Pseudocercospora spp. are globally distributed, with a focus on tropical and subtropical environments [39]. Key pathogens include P. fijiensis, P. musae, and P. eumusae, which collectively contribute to the black Sigatoka complex affecting banana crops, and P. ulei, responsible for South American Leaf Blight in natural rubber plants Hevea brasiliensis [40–42]. Despite their significance in agriculture and the environment, only a few Pseudocercospora species have been sequenced. The existing genome assemblies strongly indicate that the genus has undergone significant genome size changes [42–47]. Pseudocercospora fijiensis and P. ulei exhibited genomes with sizes of 74 Mb and 93.8 Mb, respectively, which are among the largest genomes in the Capnodiales [21, 47]. Both genomes were reported to harbor a significant number of repeats including TEs [42, 46, 48]. A hAT TE element captured H3 proteins and created 784 copies ultimately targeted by RIP throughout the repetitive regions in P. fijiensis [49]. Compared to P. fijiensis, P. ulei lacks detailed analyses of the TE families responsible of the genome expansion. Mating systems in Pseudocercospora are best known in black Sigatoka species, where mating-type idiomorphs were confirmed through gene analysis, indicating heterothallism [43, 50, 51]. Many Pseudocercospora and Cercospora genomes contain numerous expressed short MAT-related gene fragments outside the MAT1, and their origin and function remain unknown [52]. While many Pseudocercospora species reproduce both sexually and clonally, the extent of sexual reproduction across the genus remains poorly characterized [43, 53].
Here, we analyzed genome size expansion in the Pseudocercospora genus and closely related sister clades. We found P. musae, P. fijiensis and P. ulei to have the most expanded genome sizes, independent of the number of annotated genes. To detect genes potentially involved in the interaction with the host, we made extensive analyses on pathogenicity-associated genes and found a strong reduction in P. ulei. We then compared TE content among species and found substantial variation in TE content, indicating ongoing TE activity after speciation. Specific retrotransposons were the primary driver of genome size expansion with distinct contributions to independent genome size expansions. Finally, we found that at least some TE activity persists within P. ulei by analyzing genome sequencing data of six strains collected across Colombia.
Results
Pseudocercospora includes the largest known genomes in the Mycosphaerellaceae
To provide a comprehensive characterization of Pseudocercospora spp. genome evolution, we performed a comparative analysis including twenty fungal phytopathogen species grouped within the Mycosphaerellaceae family for which genome assembly and gene annotation data were available in public databases (Supplementary Table S1). Of these, ten species belong to Pseudocercospora, i.e., P. fijiensis, P. musae, and P. eumusae, which are part of the Sigatoka disease complex affecting banana crops [43], P. ulei, a major threat to natural rubber crops [46], P. macadamiae, the causal agent of husk spot in macadamia crops [44], P. cruenta, responsible for Cercospora leaf spot in cowpea crops, P. pini-densiflorae, a cosmopolitan pathogen affecting various pine species, P. fuligena, a tomato pathogen causing black leaf mold [45], P. vitis, the causal agent of isariopsis leaf spot in Vitis spp. crops [52], and P. crystallina, a fungal pathogen of eucalyptus crops [37, 53]. Another eight species from closely related sister clades were included, i.e., Cercospora beticola, C. berteroae, C. zeina, Sphaerulina musiva, Dothistroma septosporum, Zasmidium cellare, Ramularia collo-cygni and Zymoseptoria tritici. Genome assemblies and annotations of two additional species closely related to Pseudocercospora, Paracercospora egenula and Rhachisphaerella mozambica, were performed in this study.
BUSCO scores for Pseudocercospora and sister clade genomes indicated high completeness ranging from 93.8% in P. musae to 99.0% in P. cruenta, C. berteroae and S. musiva (Supplementary Figure S1). Genome assembly contiguity (i.e., N50) ranged between 42.9 kb in P. eumusae to 5.9 Mb in P. fijiensis (Fig. 1; Supplementary Table S1). The largest genomes were predominantly represented by the Pseudocercospora genus (Fig. 1, highlighted in mauve). Genome sizes of all analyzed species ranged from 29.3 Mb in S. musiva to 93.7 Mb in P. ulei. To compare the number of predicted coding regions per genome, we performed gene annotations for the following seven species to fill gaps in publicly available data, i.e., P. crystallina, P. cruenta, P. fuligena, P. vitis, P. pini-densiflorae, and the outgroups Pa. egenula and Rh. mozambica (annotations available on Zenodo: https://zenodo.org/records/15862053). The number of annotated protein-coding genes in Mycosphaerellaceae species ranged from 7,342 to 16,015 (Fig. 1). Za. cellare and P. macadamiae exhibited the highest counts of annotated genes, with 16,015 and 15,430 genes, respectively. Conversely, P. vitis and P. crystallina displayed the lowest counts of annotated genes, with 7,342 and 8,716 predicted genes, respectively (Fig. 1). The limited number of gene candidates may be attributed to the lack of specific training in the gene prediction algorithm. Pseudocercospora ulei has undergone a substantial genome expansion compared to all species in this study. The genome of P. ulei was three times larger than its closest relative, P. vitis, and 1.2 times larger than P. fijiensis, the second-largest genome in this study. The three largest genomes, P. ulei, P. fijiensis and P. musae each showed genome size increase, yet no increase in gene numbers.
Fig. 1.
Genome assembly quality metrics of Pseudocercospora and closely related species including assembled genome size, gene content and N50 values. The dot color represents the genome assembly completeness score, which was assessed with the number of complete single copy BUSCO (Benchmarking Universal Single-Copy Orthologs) genes. Dot size indicates N50. Genomes from the Pseudocercospora genus are labeled in mauve
Phylogenomic analyses reveal independent genome size expansions
The phylogenetic relationship of the Pseudocercospora genus and closely related species within the Mycosphaerellaceae was assessed using Blumeria graminis from the Erysiphaceae family as an outgroup to root the tree. The species grouped into three distinct clades as expected (Fig. 2A). Clade A contained most of the Pseudocercospora species, except for P. crystallina. The two newly assembled species Rh. mozambica and Pa. egenula clustered with clade C. The species closest to massively expanded P. fijiensis and P. ulei genomes each showed small genome sizes.
Fig. 2.
Phylogenetic relationship and pathogenicity-associated genes in Mycosphaerellaceae family genomes. A Phylogenetic tree of species within the Mycosphaerellaceae family. Dot plots represent the genome size. Genomes belonging to the Pseudocercospora genus are filled in mauve. Blumeria graminis was used to root the tree as an outgroup. B Secreted protein profiles of species within the Mycosphaerellaceae family. Left: The gray background represents the total proteome, and dark orange indicates the predicted secretome. Middle and right: The gray background represents the predicted secretome, while carbohydrate-active enzymes (CAZymes) are shown in purple and effector candidates are shown in yellow. C Secondary metabolite gene clusters in species of the Mycosphaerellaceae family. The colors indicate the different categories of secondary metabolite gene clusters
Reduction in pathogenicity-associated genes in P. ulei
To assess whether genomes experienced an expansion of pathogenicity-associated genes, we estimated the number of candidates for the secretome, with a focus on carbohydrate-active enzymes (CAZymes) and effectors (Fig. 2B). The repertoire of diverse CAZymes facilitates an organism to persist on various carbon sources and adopt various lifestyles. Wide repertoires typically reflect saprotrophic lifestyles and complex environments. Small CAZyme repertoires are often associated with a high degree of specialization (e.g. a specific plan host). As expected, the secretome candidates made up only a small share of the entire proteome. We identified differences in the Pseudocercospora genus with the largest secretome in P. macadamiae (n = 921 proteins) and the smallest in the closely related P. ulei (n = 212). Pseudocercospora vitis presented a reduced proteome with a similar number of secretome candidates compared to the genus. Pseudocercospora ulei showed a reduced number of CAZymes (n = 102) and effector candidates (n = 73) as well. However, CAZymes and effectors made up a lower proportion of the secretome in P. ulei compared to the other Pseudocercospora and closely related species. Secondary metabolite gene clusters showed similar numbers and proportions in categories among the genus (Fig. 2C). Pseudocercospora ulei also showed a reduced number of secondary metabolite gene clusters, with a proportionally strongest reduction in T1PKS and NRPS categories. The number of pathogenicity-associated genes may be correlated with lifestyle in fungi [54]. We found the largest CAZyme and effector repertoires in P. macadamiaea that has been described as necrotroph, and the smallest in the biotroph P. ulei [44, 46]. Consistent with total gene content, pathogenicity-associated genes and gene clusters were not correlated with genome size expansions in P. fijiensis and P. ulei.
Genome size increases associated with TE expansions
To objectively compare the repeat content among Pseudocercospora and closely related species, we used an assembly-free approach based on short read sequencing. Such an approach likely underrepresents repeat content but removes the bias stemming from unequal genome assembly qualities. We used the tool dnaPipeTE for assembly-free repeat detection assessing the following repeat types: low complexity, rRNA repeats, simple repeats, and TEs. TEs were further classified into LTR retrotransposons, LINEs and DNA transposons. The repeat content varied strongly within the Pseudocercospora genus and closely related species, ranging from 1.63% in P. macadamiae to 71.02% in the closest relative P. ulei (Fig. 3). Pseudocercospora macadamiae, P. pini-densiflorae and Rh. mozambica showed very low repeat contents of less than 5%, all of which have small genomes. In P. cruenta and P. eumusae around a quarter of the genome was covered by repeats, and in P. musae, P. fijiensis and Pa. egenula, around half of the genome was covered by repeats. Generally, closely related species showed drastically different repeat contents. We observed a significant correlation (Pearson’s, r = 0.8, p = 0.01) between genome size and the proportion of repetitive sequences across Pseudocercospora species, indicating that genome size expansion is likely driven largely by the proliferation of repetitive elements. Additionally, we found a strong negative correlation (r = −0.84, p = 0.004) between the repetitive content and the proportion of pathogenicity-associated genes in Pseudocercospora genomes. Most repeats in genomes with moderate to high repeat content were unclassified TEs. Failure for classification by dnaPipeTE likely stems from fragmentation or low coverage. We found variation in TE lengths among genomes with e.g., most TEs being below 500 bp in P. ulei (Supplementary Figure S2). LTR retrotransposons remained at proportions lower than 2.7% across the genus, except for P. ulei, where these accounted for ~ 30% of the genome. Other TE types were only detected at low proportions. LINEs were only detected in P. eumusae and P. ulei, and DNA transposons were only detected in P. musae. Simple repeats expanded slightly in P. cruenta and P. ulei. Repeats of low complexity and rRNA remained at proportions lower than 1.5% throughout the genus. Neither the repeat content nor the types of repeat content correlated with the phylogenetic position of the species, indicating likely independent bursts of repeat activation. Despite the strongly differing repeat contents between the species, we did not detect a correlation with the presence of the orthologs DIM-2 and RID that are part of the RIP machinery (Fig. 3). Only P. fijiensis and P. cruenta showed orthologs of high enough coverage and sequence similarity, indicating that RIP might not be functional in most Pseudocercospora species.
Fig. 3.
Repeat element distribution among Pseudocercospora genus genomes and two closely related species. Light gray indicates the estimated non-repetitive portion of the genomes. Dark grays indicate repeats of low complexity, rRNA and simple repeats. Green and blue colors indicate TEs. The order of the plots follows largely the phylogenetic grouping (Fig. 2A). The boxes on the right indicate the presence of methyltransferases that are linked to RIP functionality
LTR retrotransposons underpin genome size expansions in Pseudocercospora
The assembly-free TE detection with dnaPipeTE indicates strong and phylogeny-independent TE expansions. To further clarify the genome size expansion dynamics, we used the Earl Grey pipeline to produce high-quality TE annotations and classifications for the two high-quality expanded genomes of P. fijiensis and P. ulei. TE coverage in the genome assessed by Earl Grey was similar to the estimations by dnaPipeTE (Fig. 4A; Earl Grey TE family consensus sequences and annotations are available on Zenodo: https://zenodo.org/records/15862053; TE family names are species specific and do not indicate shared families). However, classification of the individual TE families was improved by access to the full genome sequence. Length estimates of TE fragments were also more robust. Most TE copies belong to LTR retrotransposons consistent with the assembly-free TE detection approach. In P. fijiensis, LTR retrotransposons covered 29.4% (21.8 Mb) of the genome, followed by unclassified TEs (13.3%, 9.8 Mb), DNA transposons (3.9%), Rolling-circle/Helitrons (2.7%), and LINEs (1.7%). These proportions are consistent with previous annotations by Chang et al. (2016). Who reported 31.7% LTRs, 9.9% unclassified repeats, 1.3% helitrons and, 5.1% LINEs. In P. ulei, LTR retrotransposons made up the largest part of the genome (74%, 69.5 Mb), followed by a small fraction of unclassified TEs (3.1%, 2.9 Mb), and satellite sequences (0.5%). The bulk of the detected TE fragments overall, which includes full-length elements and fragmented ones due to nested insertions were either LTR retrotransposons or remained unclassified (Fig. 4B). The diversity of TE superfamilies is larger in P. fijiensis (n = 22), while the P. ulei genome includes only 8 TE superfamilies. Notably, most LTR retrotransposons belong to the RLG superfamily (formerly known as Gypsy, and to be renamed, see [55]), with 4,787 TE fragments in P. fijiensis and 26,795 TE fragments in P. ulei. Other LTR retrotransposons were only found at low copy numbers. Given the evidence for high degrees of TE fragmentation, we compared the lengths of each LTR retrotransposon fragments between the two species and Z. tritici, which was subject to an extensive manual TE curation [56, 57]. Consistent with the assembly-free approach, the mean length of LTR retrotransposons was lower in P. ulei (2,569 bp) compared to the other species (3,996 bp in P. fijiensis and 3,790 in Z. tritici). Most TE fragments in P. ulei were larger than 500 bp, suggesting that most represent reliable annotations (Fig. 4C).
Fig. 4.
TE coverage in the two enlarged genomes of P. fijiensis and P. ulei based on high-quality detection and classification with Earl Grey. A Total length of TEs per genome. The colors indicate the TE category. B Number of TE copies assessed by Earl Grey for the different TE superfamilies. Copies were counted as both full-length TEs or fragments. Colors indicate the species. C Distribution of LTR retrotransposon fragment lengths detected in P. fijiensis and P. ulei compared to the manually curated TE content of the Z. tritici outgroup. LTR retrotransposons include both full-length and fragmented copies
The genome-wide analysis of TEs revealed 391 species-specific families for P. fijiensis, and 277 for P. ulei, with no shared TE families. To identify candidate TE families responsible for recent TE activity bursts and subsequent genome size expansion, we filtered for families with copy numbers above 200. The P. fijiensis genome showed only two TE families with copy numbers above 200, but we found 29 such TE families in P. ulei, one of which had > 800 copies (Fig. 5A). The high-copy TE families in P. fijiensis belong to RLG and Helitrons, and RLG in P. ulei, with two TE families remaining unclassified. Taken together, this suggests that the repeat expansion in P. fijiensis likely stems from a higher diversity of low copy TEs, while the high-copy numbers of a few TE families in P. ulei suggests a more recent burst of fewer TE families.
Fig. 5.
TE families with high copy numbers in P. fijiensis and P. ulei based on high-quality detection and classification with Earl Grey. A Correlation of copy number and average length of species-specific LTR retrotransposons. TEs include both full-length TEs and fragments. Mauve dots indicate TE families with more than 200 copies. B Copies of high-copy number TE families annotated in P. fijiensis and P. ulei. Colors indicate the superfamily. Each TE family is indicated by a box, and box size indicates the number of copies
To improve the TE classification and to reduce fragments of TEs erroneously classified as full-length elements, we conducted a manual curation of TE families in P. ulei, created consensus sequences and renamed the remaining TE families according to the three-letter code from Wicker et al. (2007). Many RLG or unclassified families detected by Earl Grey were mostly fragments of the newly named RLG_Mira family, followed by RLG_Ginan. Given the estimated length for the TE consensus sequences, we confirmed that most TEs in P. ulei were fragments of less than 80% full-length (Supplementary Figure S3A). Among high-copy retrotransposons, lengths remained highly variable, especially for the two high-copy TE families RLG_Mira and RLG_Ginan (Supplementary Figure S3B). We created a phylogenetic tree for RLG_Mira coding regions and identified two well differentiated clusters (Supplementary Figure S3C). GC content of RLG_Mira coding regions showed almost exclusively a moderate to high GC content, indicating that this element was not affected by RIP despite being recently active.
Genomic landscape of the reference-quality Pseudocercospora genomes
To identify retrotransposons located close to genes, we calculated the distances between annotated genes and the closest retrotransposons (Fig. 6A). TEs were generally closer to genes in P. ulei (mean = 14,922 bp) compared to P. fijiensis (mean = 17,785 bp) and Z. tritici (mean = 68,130 bp). Only a small number of direct overlaps were detected in P. fijiensis (n = 98, 0.8% of all genes) and Z. tritici (n = 31, 0.3% of all genes), however, significant overlaps were found in P. ulei (n = 6,441, 51.1% of all genes). Around 10% of annotated genes in P. ulei may be misannotated and represent TE coding regions instead (Supplementary Figure S4). Furthermore, we analyzed gene, TE in general and RLG_Mira contents in windows of 10 kb for the largest 12 scaffolds in P. ulei (Fig. 6D). We found a strong compartmentalization between TE-rich regions with a reduced number of genes and gene-rich, TE-depleted regions. RLG_Mira elements were present in most TE-rich regions, but differed in the amount of overlap. Next, we overlayed large RIP affected regions, which showed a similar distribution as the TE-rich regions but not the RLG_Mira copies, indicating that RIP might not be active anymore, or might not be triggered by RLG_Mira. Effector and CAZyme candidates were detected in each of the compartment types. The strong compartmentalization of the P. ulei genome indicates strong purifying selection acting against new TE insertions in gene-rich regions, and relaxed selection in TE-rich regions. The proximity of TEs and some genes could also stem from TEs, which were misannotated as genes. Finally, compartmentalization could arise from TEs differentially targeting specific regions in the genome. However, to test such hypotheses, P. ulei would need to be experimentally tractable with molecular tools.
Fig. 6.
TE landscape in P. fijiensis, P. ulei and Z. tritici. A Variation in distances between genes and the closest retrotransposon. Genes with more than one TE insertion were treated as a single insertion. B Circos plot visualizing the genomic landscape for P. ulei. Genes, TEs and RLG_Mira content were calculated for window sizes of 10 kb. Large RIP affected regions (LRAR) indicate regions of at least 4 kb with a high RIP composite index. The RIP composite index is calculated as described in Van Wyk et al. as [TpA/ApT] – [(CpA + TpG)/(ApC + GpT)], and a RIP composite index above is considered RIP-positive [58]
TE content analyses of P. ulei strains
To determine whether some TE activity persists in P. ulei, we sampled and whole-genome sequenced strains from natural rubber tree infections in three different locations across Colombia (Fig. 7A). We assessed the genetic structure of the P. ulei strain collection using 1,802,029 genome-wide SNPs. Strains clustered into three distinct groups according to geography (Fig. 7A). We then mapped short-read data against the manually curated P. ulei consensus library to assess the coverage and estimate the number of copies for each TE family. TE copy numbers were largely stable and similar to the direct assessment in the reference genome (Fig. 7B). Two strains had a slightly higher number of estimated TE copy number than the reference strain. Estimated TE copy numbers varied slightly, but independent of geographic origin. Like the reference genome, retrotransposons and the RLG superfamily were overrepresented in the additional strains as well. At the TE family level, RLG_Mira and RLG_Ginan families were the predominant components of the repetitive content of all the P. ulei strains assessed, as seen in the reference genome (Fig. 7C). We observed small differences in TE family coverage among the six strains and the P. ulei reference genome data, however these differences may reflect limitations in TE detection with short reads rather than biological variation. High TE content is, hence, a broadly shared pattern within the species, and variability in content indicates that TE activity might be ongoing.
Fig. 7.
Whole-genome sequencing and TE content analyses of six P. ulei strains collected in Colombia. A PCA analysis of P. ulei strains based on linkage-pruned genetic variants and their geographical locations. B Estimated coverage of P. ulei strains using the McClintock coverage analysis based on a curated TEs library for the genus. The color represents the superfamily. C Estimated coverage of species-specific RLG TE families
Discussion
Tracking genome size evolution among fungi remains limited to few groups including the Pucciniales, Erysiphaceae or Glomeraceae [59–63]. Our study aimed to explore genome size dynamics within the species-rich Pseudocercospora genus consisting predominantly of host-specific plant pathogens [37]. We found highly variable genome sizes even among closely related species ranging 30.9–93.7 Mb, with the genome of P. ulei being three times larger than its closest relative, P. vitis, and 1.2 times larger than the second-largest genome in the Mycosphaerellaceae family (P. fijiensis). Our assembly-free approach based on low coverage short reads allowed us to create draft repeat coverage estimations that could be confirmed with high-quality genome assemblies. Genome enlargement did not correlate with a higher number of genes or pathogenicity-associated genes. The observed differences in gene content among genomes may be influenced by heterogeneity in the applied gene prediction pipelines between published gene annotations and de novo gene annotations in this study. However, we do expect this impact to be minor and far below the reported gene content variation. The largest genome, P. ulei, even showed a slight reduction in coding sequence content, and a dramatically increased number of TE insertions into predicted genes. Genome expansions were largely caused by TE expansions, but the expansion characteristics varied in terms of number and diversity of TE families involved. RLG retrotransposons were consistently involved in genome size expansions, yet no TE family was involved in more than one observed burst. Genome size expansions are likely phylogeny-independent, and might be caused by the activation of specific TE families during stress conditions, or by horizontal transfer of TEs with subsequent bursts in the new host, as previously hypothesized [64–66].
To compare genomes of various quality and annotation, we used an assembly-free approach with short reads. dnaPipeTE detects TEs even in low coverage genomes. The repeat contents of the genomes remain a rough estimation, and most of the potential TE families were not classified. The use of short reads made it harder to estimate the coverage of the genome by TEs and other repeats, as repeats are too short to cover most TEs, and the presence of a TE fragment carries no information about the size of the full-length TE. However, when using the assembly-based TE detection with Earl Grey in the two best assembled genomes, i.e., those of P. fijiensis and P. ulei, we found similar repeat contents. Earl Grey estimated larger numbers of classifiable TE families. However, the short read approach was not useful to fine-tune TE classifications or clearly assess locus specific TE insertions. In contrast, the tool-provided coverage analyses of the genomes provided a less skewed variation in TE content.
Our findings indicate that genome size expansions among Pseudocercospora species were largely caused by differences in TE content. We found a striking difference in genome size and repeat content between each P. ulei and its closest relative P. macadamiae, and between P. fijiensis and its closest relatives P. musae and P. eumusae. In addition, different, species-specific TE families were responsible for these individual expansions. Pseudocercospora ulei had an almost exclusive expansion by LTR retrotransposons, namely by the two families RLG_Mira and RLG_Ginan. Pseudocercospora fijiensis also had an expansion of RLG elements, although DNA transposons and LINEs were part of the expansion as well, the number of TE families was higher, yet the copy per family was a bit lower. Pseudocercospora fijiensis TEs are known to be severely impacted by the defense mechanism RIP [48]. Even though P. ulei shows many large RIP affected regions, the expanded RLG_Mira coding sequences show no indications of RIP effects. No RID orthologs were detected in P. ulei and other Pseudocercospora species, indicating that the gene was lost early in the genus divergence or highly fragmented, and that the RIP machinery might not be functional anymore. Losses of the RIP machinery are common among ascomycetes [67], and signatures of ancient RIP activity may persist in the genome, which could explain why most TE-rich regions, except the recently active RLG_Mira, overlapped with large RIP affected regions.
The TE induced genome size expansions were most likely initiated by independent triggers, resulting from activated TE families not shared by a direct ancestor. TE activity is often caused by stress conditions, in which previously silenced TEs are de-repressed and create new copies [22, 68]. For fungal plant pathogens, stress includes the response of the host, fungicide application or climatic factors. Such stressors might have induced TE activity, initiating stepwise genome size increases. Expansions would have been countered if TEs were inserted frequently into conserved regions, creating strong negative effects. Fungi might experience elevated stress when adapting to a new host species. H. brasiliensis, the host of P. ulei, contains a poisonous latex with antifungal properties [69]. Depending on which host the last shared ancestor of P. ulei and P. macadamiae occupied, one or both species might have encountered similar new stress conditions.
Clustering of TEs in specific chromosomal compartments and the prevalence of fragmented copies suggest that young TEs inserted into older TEs leading to nested insertions. This process may be entirely due to an absence of selection against TEs inserting into TE rich compartments in contrast to purifying selection against TEs inserting into genic regions. Without an experimentally tractable system, it remains unknown whether some TEs might also have insertion site preferences. P. ulei shows a reduced set of effectors, a characteristic shared with Oidium heveae, another pathogen of H. brasiliensis [70]. Further examples do not follow this same pattern though with other fungal and oomycete pathogens of H. brasiliensis showing no reductions, or even increases in effector gene content. Hence, reduction of effector genes is unlikely to be a general adaptation to the host plant [71–74]. Genus-wide analyses and beyond can identify broad patterns of factors influencing TE activity. Our study reinforces the observation that genome size expansions are initiated mostly in terminal branches making direct observation of causal factors very challenging. However, broad surveys of agriculturally relevant plant pathogen genera will build towards a more complete picture of the interplay of TEs, genome sizes and pathogen functions.
Methods
Data acquisition and genome assembly
We obtained genome assemblies for 20 species of the Mycosphaerellaceae family from the National Center for Biotechnology Information (NCBI, Supplementary Table S1). Our comparative genome analysis focused on ten species from public databases from the Pseudocercospora genus: P. cruenta, P. crystallina, P. eumusae, P. fuligena, P. fijiensis, P. musae, P. macadamiae, P. ulei, P. pini-densiflorae, and P. vitis. An additional eight related species genomes were accessed from public databases: C. beticola, C. berteroae, C. zeina, S. musiva, D. septosporum, Za. cellare, R. collo-cygni and Z. tritici. Because no assembly was available, we used short-read sequencing data to produce draft genome assemblies for two additional closely related species, Pa. egenula and Rh. mozambica. Pa. egenula and Rh. mozambica genome assemblies were constructed from Illumina short-reads. For that end, Illumina raw reads were initially assessed with FastQC v.0.11.9 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads were pre-processed with Trimmomatic v.0.39 using the following parameters: ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:1:true TRAILING:2 SLIDINGWINDOW:4:15 MINLEN:90 [75]. For phylogenetic analyses, we included B. graminis as an outlier group to root the tree [76]. Trimmed reads were assembled with SPAdes v.3.13.0 using the –careful parameter [77]. Assemblies were evaluated with QUAST v.5.2.0 [78]. Genome completeness was assessed based on orthologous gene composition obtained with BUSCO v.5.7.1 using the ascomycota_odb10 database [79].
Phylogenetic analyses
We performed phylogenomic inference using 1315 orthologous BUSCO gene sequences that were shared by all genomes. Amino acid sequences from single copy orthologous genes shared between the assessed species were aligned with MAFFT v.7.310 with the parameters –genafpair –maxiterate 1000 [80]. A maximum likelihood tree was estimated with IQTree v.2.2.5 using 1000 bootstrap replicates [81]. For this, we initially concatenated independent tree files previously inferred and then estimated the tree with Astral v.5.7.8 using the default parameters and B. graminis as outgroup to root the tree [82].
Structural and functional annotation
To compare functional annotations within the Mycosphaerellaceae family, we included 13 genomes with gene annotations published on NCBI (previously published methods and software used for structural annotations of these species are summarized in Supplementary Table S1), and performed de novo structural annotations for 7 genomes where this information was lacking: P. crystallina, P. cruenta, P. fuligena, P. vitis, P. pini-densiflorae, Pa. egenula and Rh. mozambica. We used AUGUSTUS v.3.5.0 [83] to train the gene predictors with gene models from the P. ulei reference genome [46]. Predicted proteomes were extracted from the gene candidate catalogs using gffread v0.12.1 [84]. To predict proteins interacting with the plant host, we used the Predector pipeline v.1.2.7 [85]. Predector integrates a range of fungal secretome and effector discovery tools, and ranks the effector candidates based on a machine-learning approach. Predector also includes CAZymes identification which is predicted by homology mapping the amino acid sequences against dbCAN v.10 database [86]. Secondary metabolite gene clusters were predicted from genome assemblies using the antiSMASH web server v.7.0 [87]. To detect genes of the RIP machinery in genomes, we conducted TBLASTN searches of the annotated P. fijiensis orthologs of the methyltransferases DIM-2 (XP_007928308.1) and RID (XP_007920475.1). We filtered the hits for a sequence similarity above 50%, and a sequence coverage above 70%. Sequence matches below this threshold were considered as absence or indication of fragmentation.
Assembly-free transposable element annotations
Given the variable quality of genome assemblies in the Pseudocercospora genus, we resorted to analyzing TEs with dnaPipeTE, an assembly-free tool optimized for TE detection in low coverage short read datasets [88]. Briefly, dnaPipeTE analyzes repetitive elements by processing raw genomic reads. It selects three subsamples of low coverage (< 1x) reads to assemble TEs into contigs. The assembled contigs are then annotated through homology comparisons using the DFAM database [89]. DnaPipeTE was used for TE annotation only in the eight Pseudocercospora species and two closely related species Pa. egenula and Rh. mozambica for which short-read sequencing data were available (Supplementary Table S1).
Sampling of P. ulei strains
We gathered a total of six P. ulei strains isolated in the main rubber producer regions of Colombia (Fig. 7A). Pseudocercospora ulei strains were isolated from H. brasiliensis clones established in different regions of Colombia. Two strains were isolated from the leaves of the FX 3864 clone located in Vereda Santa Rosa in the Guaviare Department and provided by the Guaviare Rubber Producers and Marketers Association. Three strains were isolated from leaves of the IAN 873 clone located in the municipality of Belén de los Andaquíes on the Los Gomas farm and provided by the Universidad de la Amazonia. One strain was isolated from the leaves of the RRIM600 clone located in the clonal gardens of Villavicencio—La Libertad and provided by the Corporación Colombiana de Investigación Agropecuaria (Agrosavia) and. All the samples were obtained under the Addendum No. 20 of the Framework Contract for Access to Genetic Resources and their Derivatives (No. 121 of January 22, 2016) established between the Ministry of Environment and Sustainable Development and the National University of Colombia. The information about each geo-referenced sampling point is shown in Supplementary Table S2.
Propagules were isolated from single foliar, sporulating lesions from which conidia were collected and cultured on M3 solid medium at 25ºC in the dark for 45 days until visible stroma formation according to the protocol [90]. Once the stroma reached a size of 5 × 5 mm these were macerated into 2 mL microcentrifuge tubes (Eppendorf®, Germany) and transferred to 125 mL flasks containing M4 sporulation solid medium. The M4 medium consists of potato broth, amino acids, and peptone [90]. Sporulation was stimulated by exposing the cultures to white light for 90 min for six days [91].
Genomic DNA extraction and sequencing of P. ulei
High molecular weight DNA from sporulated stromata was extracted following Stirling's protocol [92], modified by adding phenolic extraction followed by three phases of chloroform extractions. DNA concentration, integrity, and purity were assessed by fluorometry (Qubit®), agarose gel electrophoresis (1%) with Tris–borate-EDTA (TBE) buffer stained with SYBR safe (0.5 mg/L), and spectrophotometry (Nanodrop®), respectively. Short-read sequencing was performed using the DNBSEQ Platform Sequencing on the DNBSEQ PE150 instrument by MGI Inc. (China) at the BGI Hong Kong Tech Solution NGS Lab, utilizing the DNBseq DNA library construction kit. Short read raw data was generated in paired-end mode (2 × 150 bp).
Genome-wide SNP analyses
To assess genetic variation among P. ulei strains, whole-genome resequencing data from six strains collected from different departments in Colombia were analyzed. Raw paired-end reads were quality-checked using FastQC v.0.11.9 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and adapters and low-quality sequences were trimmed using fastp v.1.01 with a minimum quality threshold of Q20 [93]. Trimmed reads were aligned to the P. ulei reference genome using BWA-MEM v0.7.17 [94]. SAM files were converted to BAM and sorted with SAMtools v.1.20 [95], and duplicates were marked using Picard v.2.27.4 (https://github.com/broadinstitute/picard). Variant calling was performed using GATK (v.4.3.0.0) HaplotypeCaller in GVCF mode for each sample, followed by joint genotyping with GenotypeGVCFs [96]. The resulting variant calls were filtered using GATK VariantFiltration based on the following thresholds: QD < 20.0, QUAL < 10,000.0, MQ < 30.0, ReadPosRankSum < −2.0 or > 2.0, MQRankSum < −2.0 or > 2.0, and BaseQRankSum < −2.0 or > 2.0. To explore population structure of the six P. ulei strains, we performed a principal component analysis (PCA) based on filtered SNPs. The filtered VCF file was first converted to the PLINK binary format using PLINK v.1.9 [97]. Linkage disequilibrium pruning was applied with the option –indep-pairwise 50 10 0.2 in PLINK. A PCA was then conducted using the –pca option in PLINK, which generated eigenvalues and eigenvectors. The first two principal components were plotted in R [98], and sample metadata on geographic origin was integrated into the PCA plot to visualize clustering patterns associated with geographic regions.
Transposable element genome annotation
To obtain high-quality TE libraries for the two largest and most contiguous genomes of species P. fijiensis and P. ulei, we ran the Earl Grey pipeline v4.1 [99]. The Earl Grey pipeline combines identification of TEs based on preexisting libraries and de novo approaches for TE annotation. Repetitive elements were first identified and masked by RepeatMasker v4.1.2 (http://www.repeatmasker.org), ignoring low-complexity repeats and small RNA genes. The masked genome was subsequently used for de novo TE identification performed with RepeatModeler v.2.0.2 [100] using RepBase v.23.08 and Dfam v.3.3 databases for the DNA and amino acid sequence identification. TEs were classified based on the similarity between de novo annotated and known TEs, creating a new combined library. Finally, full-length long terminal repeat retrotransposons (LTRs) were identified with LTR_Finder v1.07 [101]. To estimate TE distribution and a potential impact on gene integrity and expression, we compared the annotations of TEs and genes in P. fijiensis and P. ulei and Z. tritici separately. We used BEDtools closest v.2.30.0 with the parameter -D a [102]. Genes with more than one TE insertions were counted as just a single occurrence.
Manual TE consensus identification
To obtain high-quality TE family consensus sequences, a manual curation as described in [57] was conducted. In short, the RepeatModeler and Earl Grey consensus sequences were first curated with WICKERsoft [103]: similar sequences were searched genome-wide with blastn v.2.13.0 [104]. 15–25 sequences of a subset of hits with 300 bp added each up- and downstream were extracted, and a multiple sequence alignment was(G Higgins & M Sharp, 1988)2.1 [105]. Visual inspection, as well as information on the sequences of target site duplications and expected start and end sequences were used to define the actual boundaries of each TE family [15], and higher quality consensus sequences were created. New TE families were classified depending on the homology of encoded proteins and the presence and type of terminal repeats, and named after the three-letter classification system [15]. To remove redundancy and predicted TE families created from TE fragments, each new TE consensus sequence was compared against the already curated consensus sequences with blastn. Many previously predicted families turned out to be redundant, as they were fragments of full-length consensus sequences.
A second round TE curation was done to identify non-autonomous TE families that do not contain some or all protein sequences. LArge Retrotransposon Derivates (LARD) and Terminal Repeat retrotransposons In Miniature (TRIM) were detected with LTR-Finder and the filters -d 2001 -D 6000 -l 30 -L 5000 and -d 30 -D 2000 -l 30 -L 500 respectively. Miniature Inverted-repeat Transposable Elements (MITE) were detected with MITE Tracker [106]. Short Interspersed Nuclear Elements (SINE) were detected with SINE-Finder in Sine-Scan [107, 108]. Predicted consensus sequences were compared with WICKERsoft as described above, and removed if less than 5 copies were detected in the whole genome or if a TE consensus sequence already existed. The P. ulei reference genome was then annotated with the curated consensus sequences using RepeatMasker with a cut-off value of 250, and simple repeats and low complexity region hits were filtered out.
Phylogenetic reconstruction of RLG_Mira coding regions
To test if the high-copy TE family RLG_Mira underwent a recent burst, we performed multiple sequence alignment and phylogenetic analyses of its coding regions, following an approach established by Oggenfuss et al. 2023. All full-length sequences and fragments of RLG_Mira copies detected with RepeatMasker in P. ulei and a copy from P. macadamiae as an outlier were extracted with SAMtools faidx from the reference genome. Sequences on the negative strand were reverse-complemented. The coding sequence of RLG_Mira was extracted with a blastx search against the PTREP18 TE protein database (https://trep-db.uzh.ch/), and the best hit was retained. A multiple sequence alignment was created containing all sequences from P. ulei, the copy from P. macadamiae and the coding sequence using MAFFT and the parameters –reorder –local-pair –maxiterate 1000 -nomemsave–leavegappyregion. The multiple sequence alignment was then trimmed at the start and end positions of the coding sequence using extractalign from EMBOSS [58]. Sequences and fragments that covered less than 50% of the coding region were removed with trimAl v.1.4.rev15 [109]. To prevent structural variants in a subset of RLG_Mira copies to from distorting the phylogeny, conserved blocks were extracted with Gblocks v.0.91b, using the parameters -t = d -b3 = 10 -b4 = 5 -b5 = a -b0 = 5 [110]. The GC content of each sequence was calculated with geecee in EMBOSS. Maximum likelihood trees were estimated with RAxML v.8.2 [111]. First, 10 independent maximum likelihood tree searches were conducted using the parameters with the parameters raxmlHPC-PTHREADS-SSE3 -T 4 -m GTRGAMMA -p 12,345 -# 10 –print-identical-sequences. The best maximum likelihood tree was retained. Second, bootstrap analysis was performed to obtain branch support values with the parameters raxmlHPC-P- THREADS-SSE3 -T 4 -m GTRGAMMA -p 12,345 -b 12,345 -# 50 –print-identical-sequences. Finally, bipartitions were added to the best maximum likelihood tree with the parameters raxmlHPC-PTHREADS-SSE3 -T 4 -m GTRGAMMA -p 12,345 -f b –print-identical- sequences. The best scoring maximum likelihood tree was then visualized in R, using read.tree from the package treeio v.1.10.0 to import, ape v.5.7.1 to root the tree based on the P. macadamiae copy, tibble v.3.0.1 to add the GC content information to the tree and ggtree [112–115]. To detect if RLG_Mira entered P. ulei via horizontal transfer, we performed blastx and found that best hits are found in fungi including Metarhizium anisopliae.
Genomic environment of the high-quality reference genome of P. ulei
To characterize the genomic environment of P. ulei, the largest 12 scaffolds of the reference genome were split into non-overlapping 10 kb windows using EMBOSS splitter v.6.6.0 [116]. The percentages coverage by annotated TEs, by the high-copy TE family RLG_Mira and genes per window were calculated using BEDtools intersect v.2.30.0 [102]. To calculate a potential impact by RIP mutations, large RIP affected regions in the reference genome were detected using The RIPper [117]. The visualization was made with circos [118].
TE copy number estimation for P. ulei strains
The reference genome is not always representative for the whole species, and might be an outlier, which could explain the high TE density. To determine if field strains from different regions contain similar numbers of TEs, we estimated the coverage for each manually curated TE family. Raw reads were first trimmed with Trimmomatic v.0.33 with the parameters: ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36. Copy numbers for each TE family were then estimated based on normalized coverage and using the method coverage in the McClintock pipeline [119]. We attempted to track the positions of the annotated TEs; however, due to their high abundance, it was not possible to identify homologous sites with matches spanning both TEs and non-repetitive genomic regions, preventing their accurate localization within the P. ulei genome.
Supplementary Information
Acknowledgements
We are grateful to the individuals and institutions who provided natural rubber samples that enabled the isolation of Pseudocercospora ulei and the generation of Illumina sequencing data from several departments in Colombia. Specifically, we thank Lyda Constanza Galindo Rodríguez from the University of Amazonia for samples collected in the Caquetá department; the ASOPROCAUCHO Association of Rubber Producers and Traders of Guaviare for samples from the Guaviare department; and Olga María Castro from the Research Group on Conservation Agriculture for Lowland Tropical Soils at AGROSAVIA, La Libertad Research Center, for samples from the Meta department. We thank Tobias Baril from the University of Neuchâtel for help with the Earl Grey pipeline.
Authors’ contributions
SMGS, UO and DC designed the study. IAG, CAT and FAA provided biological material and performed experiments. AZZ and IS contributed datasets. SMGS and UO conducted analyses. SMGS, UO and DC wrote the manuscript. SMGS and UO acquired funding. UO and DC supervised the work. All authors approved the final manuscript version.
Funding
SMGS was supported by the Internship Excellence—Foreign Students FCS Postdoctoral Fellowship, granted by the Federal Commission for Scholarships of the Swiss Confederation. UO was supported by the Swiss National Science Foundation (P5R5PB_225522).
Federal Commission for Scholarships of the Swiss Confederation,Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung,P5R5PB_225522
Data availability
Sequence data are available from the NCBI Sequence Read Archive in the BioProject PRJNA1283274. Genome assemblies for *Pa. egenula* and *Rh. mozambica*, gene annotations for the *Pseudocercospora* species and TE annotations for *P. fijiensis* and *P. ulei*, and the TE family consensus sequences are available on Zenodo: [https://zenodo.org/records/15862053](https://zenodo.org/records/15862053) .
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Ursula Oggenfuss and Daniel Croll jointly supervised the work.
Contributor Information
Ursula Oggenfuss, Email: ursula.oggenfuss@gmail.com.
Daniel Croll, Email: daniel.croll@unine.ch.
References
- 1.Zaccaron AZ, Stergiopoulos I. The dynamics of fungal genome organization and its impact on host adaptation and antifungal resistance. Journal of Genetics and Genomics. 2025;52(5):628–40. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1673852724002844 [DOI] [PubMed]
- 2.Elliott TA, Gregory TR. What’s in a genome? The C-value enigma and the evolution of eukaryotic genome content. Philosophical Transactions of the Royal Society B: Biological Sciences . 2015;370(1678):20140331. Available from: https://royalsocietypublishing.org/doi/10.1098/rstb.2014.0331 [DOI] [PMC free article] [PubMed]
- 3.Raffaele S, Kamoun S. Genome evolution in filamentous plant pathogens: why bigger can be better. Nat Rev Microbiol. 2012;10(6):417–30. [DOI] [PubMed] [Google Scholar]
- 4.Puttick MN, Clark J, Donoghue PCJ. Size is not everything: rates of genome size evolution, not C -value, correlate with speciation in angiosperms. Proceedings of the Royal Society B: Biological Sciences . 2015;282(1820):20152289. Available from: https://royalsocietypublishing.org/doi/10.1098/rspb.2015.2289 [DOI] [PMC free article] [PubMed]
- 5.Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302(5649):1401–4. [DOI] [PubMed] [Google Scholar]
- 6.Lynch M. Complexity myths and the misappropriation of evolutionary theory. Proceedings of the National Academy of Sciences . 2025;122(23):1–5. Available from: https://pnas.org/doi/10.1073/pnas.2425772122 [DOI] [PMC free article] [PubMed]
- 7.Lower SS, McGurk MP, Clark AG, Barbash DA. Satellite DNA evolution : old ideas , new approaches. Curr Opin Genet Dev. 2018;49:70–8. 10.1016/j.gde.2018.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Plissonneau C, Stürchler A, Croll D. The Evolution of Orphan Regions in Genomes of a Fungal Pathogen of Wheat. mBio . 2016;7(5):1–13. Available from: http://mbio.asm.org/lookup/doi/10.1128/mBio.01231-16 [DOI] [PMC free article] [PubMed]
- 9.Scott AL, Richmond PA, Dowell RD, Selmecki AM. The influence of polyploidy on the evolution of yeast grown in a sub-optimal carbon source. Mol Biol Evol. 2017;34(10):2690–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Todd RT, Forche A, Selmecki A. Ploidy Variation in Fungi: Polyploidy, Aneuploidy, and Genome Evolution. Heitman J, Stukenbrock EH, editors. Microbiol Spectr . 2017 Aug 25;5(4):139–48. Available from: https://journals.asm.org/doi/10.1128/microbiolspec.FUNK-0051-2016 [DOI] [PMC free article] [PubMed]
- 11.Sipos G, Prasanna AN, Walter MC, O’Connor E, Bálint B, Krizsán K, et al. Genome expansion and lineage-specific genetic innovations in the forest pathogenic fungi Armillaria. Nat Ecol Evol. 2017;1(12):1931–41. 10.1038/s41559-017-0347-8. [DOI] [PubMed] [Google Scholar]
- 12.Frantzeskakis L, Németh MZ, Barsoum M, Kusch S, Kiss L, Takamatsu S, et al. The Parauncinula polyspora Draft Genome Provides Insights into Patterns of Gene Erosion and Genome Expansion in Powdery Mildew Fungi. mBio. 2019;10(5):10:e01692–19. [DOI] [PMC free article] [PubMed]
- 13.McClintock B. Induction of Instability at Selected Loci in Maize. Genetics . 1953;38(6):579–99. Available from: http://www.ncbi.nlm.nih.gov/pubmed/17247459%5Cn, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC1209627. [DOI] [PMC free article] [PubMed]
- 14.Fedoroff N, Wessler S, Shure M. Isolation of the transposable maize controlling elements Ac and Ds. Cell. 1983;35(1):235–42. [DOI] [PubMed] [Google Scholar]
- 15.Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8(12):973–82. [DOI] [PubMed] [Google Scholar]
- 16.Wells JN, Feschotte C. A field guide to transposable elements. Annu Rev Genet. 2020;54:7–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wessler S. LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. Curr Opin Genet Dev . 1995;5(6):814–21. Available from: https://linkinghub.elsevier.com/retrieve/pii/0959437X9580016X [DOI] [PubMed]
- 18.Hill R, Smith D, Canning G, Grey M, Hammond-Kosack KE, McMullan M. Starship giant transposable elements cluster by host taxonomy using k -mer-based phylogenetics. Rokas A, editor. G3: Genes, Genomes, Genetics . 2025;15(6). Available from: https://academic.oup.com/g3journal/article/doi/10.1093/g3journal/jkaf082/8110972 [DOI] [PMC free article] [PubMed]
- 19.Batzer MA, Deininger PL. Alu repeats and human genomic diversity. Nat Rev Genet . 2002;3(5):370–9. Available from: https://www.nature.com/articles/nrg798 [DOI] [PubMed]
- 20.Castanera R, Vendrell‐Mir P, Bardil A, Carpentier M, Panaud O, Casacuberta JM. Amplification dynamics of miniature inverted‐repeat transposable elements and their impact on rice trait variability. The Plant Journal . 2021;107(1):118–35. Available from: https://onlinelibrary.wiley.com/doi/10.1111/tpj.15277 [DOI] [PubMed]
- 21.Peter M, Kohler A, Ohm RA, Kuo A, Krützmann J, Morin E, et al. Ectomycorrhizal ecology is imprinted in the genome of the dominant symbiotic fungus Cenococcum geophilum. Nat Commun . 2016;7(1):1–15. Available from: http://www.nature.com/articles/ncomms12662 [DOI] [PMC free article] [PubMed]
- 22.Fouché S, Badet T, Oggenfuss U, Plissonneau C, Francisco CS, Croll D. Stress-Driven Transposable Element De-repression Dynamics and Virulence Evolution in a Fungal Pathogen. Arkhipova I, editor. Mol Biol Evol . 2020;37(1):221–39. Available from: https://academic.oup.com/mbe/article/37/1/221/5573762 [DOI] [PubMed]
- 23.Gusa A, Yadav V, Roth C, Williams JD, Shouse EM, Magwene P, et al. Genome-wide analysis of heat stress-stimulated transposon mobility in the human fungal pathogen Cryptococcus deneoformans. Proc Natl Acad Sci U S A. 2023;120(4). 10.1073/pnas.2209831120 [DOI] [PMC free article] [PubMed]
- 24.Fouché S, Oggenfuss U, McDonald BA, Croll D. Recurrent chromosome destabilization through repeat-mediated rearrangements in a fungal pathogen . bioRxiv. 2023. Available from: http://biorxiv.org/lookup/doi/10.1101/2023.07.14.549097 [DOI] [PMC free article] [PubMed]
- 25.Devos KM, Brown JKM, Bennetzen JL. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 2002;12(7):1075–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Schrader L, Schmitz J. The impact of transposable elements in adaptive evolution. Mol Ecol . 2019;28(6):1537–49. Available from: https://onlinelibrary.wiley.com/doi/10.1111/mec.14794 [DOI] [PubMed]
- 27.Le Rouzic A, Boutin TS, Capy P. Long-term evolution of transposable elements. Proc Natl Acad Sci U S A. 2007;104(49):19375–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet. 2007;8(4):272–85. [DOI] [PubMed] [Google Scholar]
- 29.Bewick AJ, Hofmeister BT, Powers RA, Mondo SJ, Grigoriev IV, James TY, et al. Diversity of cytosine methylation across the fungal tree of life. Nat Ecol Evol. 2019;3(3):479–90. 10.1038/s41559-019-0810-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bannister AJ, Kouzarides T. Regulation of chromatin by histone modifications. Cell Res. 2011;21(3):381–95. 10.1038/cr.2011.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Galagan JE, Selker EU. RIP: the evolutionary cost of genome defense. Trends in Genetics . 2004;20(9):417–23. Available from: http://ac.els-cdn.com/S0168952504001878/1-s2.0-S0168952504001878-main.pdf?_tid=38a51930-2f11-11e7-9d3b-00000aab0f02&acdnat=1493713847_37be5e43095b9599eb20f3958d0dc3ac [DOI] [PubMed]
- 32.Krishnan P, Meile L, Plissonneau C, Ma X, Hartmann FE, Croll D, et al. Transposable element insertions shape gene regulation and melanin production in a fungal pathogen of wheat. BMC Biol. 2018;16(1):1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Liu X, Wang X, Zhou F, Xue Y, Liu C. Genomic insights into Penicillium chrysogenum adaptation to subseafloor sedimentary environments. BMC Genomics . 2024;25(1):4. Available from: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-023-09921-1 [DOI] [PMC free article] [PubMed]
- 34.Wong VL, Ellison CE, Eisen MB, Pachter L, Brem RB. Structural Variation among Wild and Industrial Strains of Penicillium chrysogenum. Moreno-Hagelsieb G, editor. PLoS One . 2014;9(5):e96784. Available from: https://dx.plos.org/10.1371/journal.pone.0096784 [DOI] [PMC free article] [PubMed]
- 35.Agrawal A, Eastman QM, Schatz DG. Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system. Nature . 1998;394(6695):744–51. Available from: http://link.springer.com/10.1007/BF02786485 [DOI] [PubMed]
- 36.Cosby RL, Judd J, Zhang R, Zhong A, Garry N, Pritham EJ, et al. Recurrent evolution of vertebrate transcription factors by transposase capture. Science (1979) . 2021 Feb 19;371(6531):eabc6405. Available from: https://www.sciencemag.org/lookup/doi/10.1126/science.abc6405 [DOI] [PMC free article] [PubMed]
- 37.Groenewald JZ, Chen YY, Zhang Y, Roux J, Shin H ‐D., Shivas RG, et al. Species diversity in Pseudocercospora. Fungal Syst Evol . 2024;13(1):29–89. Available from: https://www.ingentaconnect.com/content/10.3114/fuse.2024.13.03 [DOI] [PMC free article] [PubMed]
- 38.Liang C, Jayawardena RS, Zhang W, Wang X, Liu M, Liu L, et al. Identification and Characterization of Pseudocercospora Species Causing Grapevine Leaf Spot in China. Journal of Phytopathology . 2016;164(2):75–85. Available from: https://onlinelibrary.wiley.com/doi/10.1111/jph.12427
- 39.Crous PW, Braun U, Hunter GC, Wingfield MJ, Verkley GJM, Shin HD, et al. Phylogenetic lineages in Pseudocercospora. Stud Mycol . 2013;75:37–114. Available from: https://www.ingentaconnect.com/content/10.3114/sim0005 [DOI] [PMC free article] [PubMed]
- 40.Friesen TL. Combating the Sigatoka Disease Complex on Banana. McDowell JM, editor. PLoS Genet . 2016;12(8):e1006234. Available from: https://dx.plos.org/10.1371/journal.pgen.1006234 [DOI] [PMC free article] [PubMed]
- 41.Guyot J, Le Guen V. A Review of a Century of Studies on South American Leaf Blight of the Rubber Tree. Plant Dis . 2018;102(6):1052–65. Available from: https://apsjournals.apsnet.org/doi/10.1094/PDIS-04-17-0592-FE [DOI] [PubMed]
- 42.Chang TC, Salvucci A, Crous PW, Stergiopoulos I. Comparative Genomics of the Sigatoka Disease Complex on Banana Suggests a Link between Parallel Evolutionary Changes in Pseudocercospora fijiensis and Pseudocercospora eumusae and Increased Virulence on the Banana Host. Hane JK, editor. PLoS Genet . 2016;12(8):e1005904. Available from: https://dx.plos.org/10.1371/journal.pgen.1005904 [DOI] [PMC free article] [PubMed]
- 43.Arango Isaza RE, Diaz-Trujillo C, Dhillon B, Aerts A, Carlier J, Crane CF, et al. Combating a Global Threat to a Clonal Crop: Banana Black Sigatoka Pathogen Pseudocercospora fijiensis (Synonym Mycosphaerella fijiensis) Genomes Reveal Clues for Disease Control. McDowell JM, editor. PLoS Genet . 2016;12(8):e1005876. Available from: https://dx.plos.org/10.1371/journal.pgen.1005876 [DOI] [PMC free article] [PubMed]
- 44.Akinsanmi OA, Carvalhais LC. Draft Genome of the Macadamia Husk Spot Pathogen, Pseudocercospora macadamiae. Phytopathology . 2020;110(9):1503–6. Available from: https://apsjournals.apsnet.org/doi/10.1094/PHYTO-12-19-0460-A [DOI] [PubMed]
- 45.Zaccaron AZ, Stergiopoulos I. First Draft Genome Resource for the Tomato Black Leaf Mold Pathogen Pseudocercospora fuligena. Molecular Plant-Microbe Interactions® . 2020;33(12):1441–5. Available from: https://apsjournals.apsnet.org/doi/10.1094/MPMI-06-20-0139-A [DOI] [PubMed]
- 46.González Sáyer SM, Oggenfuss U, García I, Aristizabal F, Croll D, Riaño-Pachon DM. High-quality genome assembly of Pseudocercospora ulei the main threat to natural rubber trees. Genet Mol Biol . 2022;45(1):1–5. Available from: http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572022000100802&tlng=en [DOI] [PMC free article] [PubMed]
- 47.Sinha S, Navathe S, Anjali, Vishwakarma S, Prajapati P, Chand R, et al. Whole genome sequencing and annotation of Pseudocercospora abelmoschi, a causal agent of black leaf mould of okra. World J Microbiol Biotechnol . 2025;41(5):174. Available from: https://link.springer.com/10.1007/s11274-025-04398-4 [DOI] [PubMed]
- 48.Santana MF, Silva JC, Batista AD, Ribeiro LE, da Silva GF, de Araújo EF, et al. Abundance, distribution and potential impact of transposable elements in the genome of Mycosphaerella fijiensis. BMC Genomics. 2012;13(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Dhillon B, Kema GH, Hamelin RC, Bluhm BH, Goodwin SB. Variable genome evolution in fungi after transposon-mediated amplification of a housekeeping gene. Mob DNA. 2019;10(1):37. 10.1101/550798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Arzanlou M, Crous PW, Zwiers LH. Evolutionary Dynamics of Mating-Type Loci of Mycosphaerella spp. Occurring on Banana. Eukaryot Cell . 2010;9(1):164–72. Available from: https://journals.asm.org/doi/10.1128/EC.00194-09 [DOI] [PMC free article] [PubMed]
- 51.CONDE‐FERRÁEZ L, WAALWIJK C, CANTO‐CANCHÉ BB, KEMA GHJ, CROUS PW, JAMES AC, et al. Isolation and characterization of the mating type locus of Mycosphaerella fijiensis , the causal agent of black leaf streak disease of banana. Mol Plant Pathol . 2007;8(1):111–20. Available from: https://bsppjournals.onlinelibrary.wiley.com/doi/10.1111/j.1364-3703.2006.00376.x [DOI] [PubMed]
- 52.Aylward J, Havenga M, Wingfield BD, Wingfield MJ, Dreyer LL, Roets F, et al. Novel mating-type-associated genes and gene fragments in the genomes of Mycosphaerellaceae and Teratosphaeriaceae fungi. Mol Phylogenet Evol . 2022;171:107456. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1055790322000690 [DOI] [PubMed]
- 53.Chen Q, Bakhshi M, Balci Y, Broders KD, Cheewangkoon R, Chen SF, et al. Genera of phytopathogenic fungi: GOPHY 4. Stud Mycol . 2022;101(1):417–564. Available from: https://www.ingentaconnect.com/content/10.3114/sim.2022.101.06 [DOI] [PMC free article] [PubMed]
- 54.Crous PW, Wingfield MJ, Cheewangkoon R, Carnegie AJ, Burgess TI, Summerell BA, et al. Foliar pathogens of eucalypts. Stud Mycol . 2019;94(1):125–298. Available from: https://www.ingentaconnect.com/content/10.1016/j.simyco.2019.08.001 [DOI] [PMC free article] [PubMed]
- 55.Jia M, Gong X, Fan M, Liu H, Zhou H, Gu S, et al. Identification and analysis of the secretome of plant pathogenic fungi reveals lifestyle adaptation. Front Microbiol . 2023;14. Available from: https://www.frontiersin.org/articles/10.3389/fmicb.2023.1171618/full [DOI] [PMC free article] [PubMed]
- 56.Wei K, Aldaimalani R, Mai D, Zinshteyn D, PRV S, Blumenstiel JP, et al. Rethinking the “gypsy” retrotransposon: A roadmap for community-driven reconsideration of problematic gene names. OSFpreprints . 2022;10.31219/o. Available from: https://osf.io/fma57/
- 57.Baril T, Croll D. A pangenome-guided manually curated library of transposable elements for Zymoseptoria tritici. BMC Res Notes. 2023;16(1):23–6. 10.1186/s13104-023-06613-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Rice P, Longden L, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16(6):276–7. [DOI] [PubMed] [Google Scholar]
- 59.Badet T, Oggenfuss U, Abraham L, McDonald BA, Croll D. A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici. BMC Biol . 2020;18(1):12. Available from: https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-020-0744-3. Cited 2026 Feb 14. [DOI] [PMC free article] [PubMed]
- 60.Miyauchi S, Kiss E, Kuo A, Drula E, Kohler A, Sánchez-García M, et al. Large-scale genome sequencing of mycorrhizal fungi provides insights into the early evolution of symbiotic traits. Nat Commun. 2020;11(1):1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Castanera R, Borgognone A, Pisabarro AG, Ramírez L. Biology, dynamics, and applications of transposable elements in basidiomycete fungi. Appl Microbiol Biotechnol. 2017;101(4):1337–50. 10.1007/s00253-017-8097-8. [DOI] [PubMed] [Google Scholar]
- 62.Tavares S, Ramos AP, Pires AS, Azinheira HG, Caldeirinha P, Link T, et al. Genome size analyses of Pucciniales reveal the largest fungal genomes. Front Plant Sci . 2014;5(AUG). Available from: http://journal.frontiersin.org/article/10.3389/fpls.2014.00422/abstract [DOI] [PMC free article] [PubMed]
- 63.Aime MC, McTaggart AR, Mondo SJ, Duplessis S. Phylogenetics and Phylogenomics of Rust Fungi. In: Advances in Genetics . Academic Press Inc.; 2017. p. 267–307. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0065266017300391 [DOI] [PubMed]
- 64.Murat C, Payen T, Noel B, Kuo A, Morin E, Chen J, et al. Pezizomycetes genomes reveal the molecular basis of ectomycorrhizal truffle lifestyle. Nat Ecol Evol . 2018;2(12):1956–65. Available from: https://www.nature.com/articles/s41559-018-0710-4 [DOI] [PubMed]
- 65.Le Rouzic A, Capy P. The first steps of transposable elements invasion: parasitic strategy vs. genetic drift. Genetics. 2005;169(2):1033–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Panaud O. Horizontal transfers of transposable elements in eukaryotes: The flying genes. C R Biol . 2016;339(7–8):296–9. Available from: https://comptes-rendus.academie-sciences.fr/biologies/articles/10.1016/j.crvi.2016.04.013/ [DOI] [PubMed]
- 67.Griem-Krey H, de Fraga Sant’Ana J, Oggenfuss U, Calegari-Alves YP, Marques AL, Berger M, et al. Transposable elements hitchhike on Starships across fungal genomes. Nat Commun . 2026. Available from: https://www.nature.com/articles/s41467-026-69410-3 [DOI] [PMC free article] [PubMed]
- 68.van Wyk S, Wingfield BD, De Vos L, van der Merwe NA, Steenkamp ET. Genome-Wide Analyses of Repeat-Induced Point Mutations in the Ascomycota. Front Microbiol . 2021;11(February). Available from: https://www.frontiersin.org/articles/10.3389/fmicb.2020.622368/full [DOI] [PMC free article] [PubMed]
- 69.Miousse IR, Chalbot MCG, Lumen A, Ferguson A, Kavouras IG, Koturbash I. Response of transposable elements to environmental stressors. Mutat Res Rev Mutat Res. 2015;765:19–39. 10.1016/j.mrrev.2015.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Van Parijs J, Broekaert WF, Goldstein IJ, Peumans WJ. Hevein: an antifungal protein from rubber-tree (Hevea brasiliensis) latex. Planta . 1991;183(2):258–64. Available from: http://link.springer.com/10.1007/BF00197797 [DOI] [PubMed]
- 71.Mei S, Hou S, Cui H, Feng F, Rong W. Characterization of the interaction between Oidium heveae and Arabidopsis thaliana. J Dig Dis. 2016;17(9):1331–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Evangelisti E, Gogleva A, Hainaux T, Doumane M, Tulin F, Quan C, et al. Time-resolved dual transcriptomics reveal early induced Nicotiana benthamiana root genes and conserved infection-promoting Phytophthora palmivora effectors. BMC Biol. 2017;15(1). [DOI] [PMC free article] [PubMed]
- 73.Hsieh DK, Chuang SC, Chen CY, Chao YT, Lu MYJ, Lee MH, et al. Comparative Genomics of Three Colletotrichum scovillei Strains and Genetic Analysis Revealed Genes Involved in Fungal Growth and Virulence on Chili Pepper. Front Microbiol . 2022;13. Available from: https://www.frontiersin.org/articles/10.3389/fmicb.2022.818291/full [DOI] [PMC free article] [PubMed]
- 74.Ali SS, Asman A, Shao J, Balidion JF, Strem MD, Puig AS, et al. Genome and transcriptome analysis of the latent pathogen Lasiodiplodia theobromae, an emerging threat to the cacao industry. Genome. 2020;63(1):37–52. [DOI] [PubMed] [Google Scholar]
- 75.Longsaward R, Viboonjun U, Wen Z, Asiegbu FO. In silico analysis of secreted effectorome of the rubber tree pathogen Rigidoporus microporus highlights its potential virulence proteins. Front Microbiol. 2024;15. [DOI] [PMC free article] [PubMed]
- 76.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Vaghefi N, Kusch S, Németh MZ, Seress D, Braun U, Takamatsu S, et al. Beyond Nuclear Ribosomal DNA Sequences: Evolution, Taxonomy, and Closest Known Saprobic Relatives of Powdery Mildew Fungi (Erysiphaceae) Inferred From Their First Comprehensive Genome-Scale Phylogenetic Analyses. Front Microbiol . 2022;13. Available from: https://www.frontiersin.org/articles/10.3389/fmicb.2022.903024/full [DOI] [PMC free article] [PubMed]
- 78.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol . 2012;19(5):455–77. Available from: http://www.ncbi.nlm.nih.gov/pubmed/22506599%0A, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3342519. [DOI] [PMC free article] [PubMed]
- 79.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics . 2013;29(8):1072–5. Available from: https://academic.oup.com/bioinformatics/article/29/8/1072/228832 [DOI] [PMC free article] [PubMed]
- 80.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. Busco: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. [DOI] [PubMed] [Google Scholar]
- 81.Katoh K, Standley DM. Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol. 2020;37(5):1530–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Zhang C, Rabiee M, Sayyari E, Mirarab S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics . 2018;19(S6):153. Available from: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2129-y [DOI] [PMC free article] [PubMed]
- 84.Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics . 2008;24(5):637–44. Available from: https://academic.oup.com/bioinformatics/article/24/5/637/202844 [DOI] [PubMed]
- 85.Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res . 2020;9:304. Available from: https://f1000research.com/articles/9-304/v1 [DOI] [PMC free article] [PubMed]
- 86.Jones DAB, Rozano L, Debler JW, Mancera RL, Moolhuijzen PM, Hane JK. An automated and combinative method for the predictive ranking of candidate effector proteins of fungal plant pathogens. Sci Rep . 2021;11(1):19731. Available from: https://www.nature.com/articles/s41598-021-99363-0 [DOI] [PMC free article] [PubMed]
- 87.Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, et al. DbCAN2: A meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2018;46(W1):W95-101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Blin K, Shaw S, Kautsar SA, Medema MH, Weber T. The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymes. Nucleic Acids Res . 2021;49(D1):D639–43. Available from: https://academic.oup.com/nar/article/49/D1/D639/5957162 [DOI] [PMC free article] [PubMed]
- 89.Goubert C, Modolo L, Vieira C, ValienteMoro C, Mavingui P, Boulesteix M. De Novo Assembly and Annotation of the Asian Tiger Mosquito (Aedes albopictus) Repeatome with dnaPipeTE from Raw Genomic Reads and Comparative Analysis with the Yellow Fever Mosquito (Aedes aegypti). Genome Biol Evol . 2015;7(4):1192–205. Available from: https://academic.oup.com/gbe/article-lookup/doi/10.1093/gbe/evv050 [DOI] [PMC free article] [PubMed]
- 90.Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA. 2021;12(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Gasparotto L, Clério J, Pereira R, Técnicos E. Doenças da seringueira no Brasil 2a edição revista e atualizada . 2012. Available from: www.embrapa.br/liv
- 92.Lieberei R. South American Leaf Blight of the Rubber Tree (Hevea spp.): New Steps in Plant Domestication using Physiological Features and Molecular Markers. Ann Bot . 2007;100(6):1125–42. Available from: https://academic.oup.com/aob/article-lookup/doi/10.1093/aob/mcm133 [DOI] [PMC free article] [PubMed]
- 93.Stirling, D. (2003). DNA Extraction from Fungi, Yeast, and BacteriaIn: Bartlett, J.M.S., Stirling, D. (eds) PCR Protocols. Methods in Molecular Biology™, vol 226. Humana Press. 10.1385/1-59259-384-4:53 [DOI] [PubMed]
- 94.Chen S. Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta . 2023;2(2). Available from: https://onlinelibrary.wiley.com/doi/10.1002/imt2.107 [DOI] [PMC free article] [PubMed]
- 95.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM . 2013. Available from: http://github.com/lh3/bwa.
- 96.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics . 2009;25(16):2078–9. Available from: https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed]
- 97.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience . 2015;4(1):1–16. Available from: https://academic.oup.com/gigascience/article/doi/10.1186/s13742-015-0047-8/2707533 [DOI] [PMC free article] [PubMed]
- 99.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2022;
- 100.Baril T, Galbraith J, Hayward A. Earl Grey: A Fully Automated User-Friendly Transposable Element Annotation and Analysis Pipeline. Arkhipova I, editor. Mol Biol Evol . 2024;41(4). Available from: https://academic.oup.com/mbe/article/doi/10.1093/molbev/msae068/7635926 [DOI] [PMC free article] [PubMed]
- 101.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117(17):9451–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Xu Z, Wang H. LTR-FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35(SUPPL.2):265–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Breen J, Wicker T, Kong X, Zhang J, Ma W, Paux E, et al. A highly conserved gene island of three genes on chromosome 3B of hexaploid wheat: Diverse gene function and genomic structure maintained in a tightly linked block. BMC Plant Biol. 2010;10(1):98. 10.1186/1471-2229-10-98 [DOI] [PMC free article] [PubMed]
- 105.Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Higgins D, Sharp PM. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988;73(1):237–44. [DOI] [PubMed] [Google Scholar]
- 107.Crescente JM, Zavallo D, Helguera M, Vanzetti LS. MITE tracker: an accurate approach to identify miniature inverted-repeat transposable elements in large genomes. BMC Bioinformatics. 2018;19(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Mao H, Wang H. SINE-scan: an efficient tool to discover short interspersed nuclear elements (SINEs) in large-scale genomic datasets. Bioinformatics. 2017;33(5):743–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Wenke T, Dobel T, Sorensen TR, Junghans H, Weisshaar B, Schmidt T. Targeted Identification of Short Interspersed Nuclear Element Families Shows Their Widespread Existence and Extreme Heterogeneity in Plant Genomes. the Plant Cell Online . 2011;23(9):3117–28. Available from: http://www.plantcell.org/cgi/doi/10.1105/tpc.111.088682 [DOI] [PMC free article] [PubMed]
- 110.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. TrimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25(15):1972–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17(4):540–52. [DOI] [PubMed] [Google Scholar]
- 112.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Wickham H, François R, Henry L, Müller K. dplyr: A Grammar of Data Manipulation. R package version 1.0.0. 2020.
- 114.Paradis E, Schliep K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35(3):526–8. [DOI] [PubMed] [Google Scholar]
- 115.Wang LG, Lam TTY, Xu S, Dai Z, Zhou L, Feng T, et al. Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data. Mol Biol Evol. 2020;37(2):599–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Yu G, Smith DK, Zhu H, Guan Y, Lam TTY. Ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol. 2017;8(1):28–36. [Google Scholar]
- 117.van Wyk S, Harrison CH, Wingfield BD, De Vos L, van der Merwe NA, Steenkamp ET. The RIPper, a web-based tool for genome-wide quantification of Repeat-Induced Point (RIP) mutations. PeerJ . 2019;7:e7447. Available from: https://peerj.com/articles/7447 [DOI] [PMC free article] [PubMed]
- 118.Krzywinski M, Schein J, Birol İ, Connors J, Gascoyne R, Horsman D, et al. Circos: An information aesthetic for comparative genomics. Genome Res . 2009;19(9):1639–45. Available from: http://genome.cshlp.org/lookup/doi/10.1101/gr.092759.109 [DOI] [PMC free article] [PubMed]
- 119.Nelson MG, Linheiro RS, Bergman CM. McClintock: An Integrated Pipeline for Detecting Transposable Element Insertions in Whole-Genome Shotgun Sequencing Data. G3: Genes|Genomes|Genetics . 2017;7(8):2763–78. Available from: http://g3journal.org/lookup/doi/10.1534/g3.117.043893 [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequence data are available from the NCBI Sequence Read Archive in the BioProject PRJNA1283274. Genome assemblies for *Pa. egenula* and *Rh. mozambica*, gene annotations for the *Pseudocercospora* species and TE annotations for *P. fijiensis* and *P. ulei*, and the TE family consensus sequences are available on Zenodo: [https://zenodo.org/records/15862053](https://zenodo.org/records/15862053) .







