Skip to main content
Genome Biology and Evolution logoLink to Genome Biology and Evolution
. 2025 Aug 26;17(9):evaf166. doi: 10.1093/gbe/evaf166

Variation in Genome Architecture and Epigenetic Modification Across the Microsporidia Phylogeny

Pascal Angst 1,✉,b, Dieter Ebert 2, Peter D Fields 3
Editor: Rebecca Zufall
PMCID: PMC12449221  PMID: 40857611

Abstract

Microsporidia are a model clade for studying intracellular parasitism, being well-known for their streamlined genomes and their extreme life history. Although microsporidia are highly diverse and ecologically important to a broad range of hosts, previous research on genome architecture has focused primarily on the mammal-infecting genus Encephalitozoon. Here, we expand that work, testing the universality of the patterns observed in Encephalitozoon by investigating and comparing variation in genetic and epigenetic architectures in the high-quality genome assemblies of several major microsporidia clades. Our comparison of nine genomes, including the first genome assemblies of Binucleata daphniae, Gurleya vavrai, and Conglomerata obtusa, and revised, improved assemblies of Glugoides intestinalis, Mitosporidium daphniae, and Ordospora colligata, found limited conservation of genetic and epigenetic architecture across all microsporidia, although many genomic characteristics, such as nucleotide composition and repeat content, were shared between genomes of the same or related clades. For example, rRNA genes were hypermethylated in most species, but their position close to chromosome ends was only found in the Encephalitozoon and its sister clade. GC-content varied widely, linked to genome size, phylogenetic position, and activity of repeat elements. These findings enhance our insight into genome evolution and, consistent with findings from other systems, suggest epigenetic modification as a regulatory mechanism of gene expression and repeat element activity in microsporidia. Our comparative genome analysis reveals high variation in genetic and epigenetic architecture among microsporidia, despite all of them adapting to a parasitic lifestyle within host cells.

Keywords: methylation, epigenetics, genome architecture, synteny, microsporidia, Daphnia


Significance Statement.

Microsporidia are widespread intracellular parasites that infect a range of hosts, from unicellular eukaryotes to humans, but our understanding of their genome evolution is currently mostly limited to a small number of model species. This study generated high-quality genome assemblies from diverse microsporidian lineages and uncovered striking variation in genome architecture, repeat element activity, and epigenetic modification. These results challenge previous assumptions about genome conservation in microsporidia and shed new light on how intracellular parasitism drives genome evolution in this diverse and understudied phylum.

Introduction

The study of parasite evolution looks at the mechanisms by which parasites adapt to exploit their hosts (Poulin and Randhawa 2015; Schmid-Hempel 2021). These adaptations include physical structures used to invade or attach to the host, secretory molecules that subvert the host's immune system or alter its behavior to the parasite's advantage, and genomic rearrangements that optimize gene transcription and genome replication. Genome evolution particularly has received much recent attention in the field of microsporidia. These fungi-related intracellular parasites exhibit high variation in genome length, possibly related to their life history (Haag et al. 2020), but although comparative studies of microsporidia have provided initial insights into what is arguably the most extreme form of parasitism, including the molecular basis of parasitic adaptations (Nakjang et al. 2013; Wadi and Reinke 2020; Williams et al. 2022), the lack of high-quality genomes for most major microsporidia clades in this little understood branch of the tree of life has limited comparative studies of genome architecture to single genera (see e.g. Mascarenhas dos Santos et al. 2023; Khalaf et al. 2024, 2025). The available short-read-based genome assemblies have provided inadequate insights into microsporidia genomic architecture, despite hinting at its high diversity.

Microsporidia exhibit highly derived genomic features for parasitism, including mitochondrion genome loss and nuclear genome compaction (Jespersen et al. 2022), the latter of which is highly variable among microsporidia clades, with different forms and levels of (non-)coding DNA loss. Extreme examples can be seen in the mammalian parasite genus Enterocytozoon, which lacks most genes involved in glycolysis and is therefore energetically completely dependent on its host (Wiredu Boakye et al. 2017), and in the Encephalitozoonidae family, which has reduced inter- and intragenic regions, including repeat content (Corradi and Slamovits 2011; Mascarenhas dos Santos et al. 2023). An opposite phenomenon, genome expansion, is observed secondarily in microsporidia with mixed-mode (horizontal and vertical) transmission, such as Nosema bombycis and Hamiltosporidium tvaerminnensis (Parisot et al. 2014; de Albuquerque et al. 2020). This likely occurs because vertical transmission reduces effective population size and thus the efficacy of natural selection to limit genome expansion, e.g. through the proliferation of transposable elements (Haag et al. 2020). While the quality of microsporidian genome assemblies is not as essential when it comes to comparing the length or presence of genetic elements, we do need high-quality assemblies to study the abundance and relative location of these genetic elements across species.

Although microsporidia are found in a wide range of host species, including humans and agriculturally important animals, from different environments, most microsporidia seem to occur in aquatic environments with the greatest richness found in Crustacea (Bojko and Stentiford 2022). Indeed, thanks, perhaps, to the vast abundance of Crustacea and the opportunistic nature of microsporidia (Weiss and Becnel 2014), Crustacea host microsporidia from all major clades (Bojko et al. 2022). Additionally, because many microcrustaceans such as the model system Daphnia are filter feeders, they accumulate infectious pathogen cells during feeding. Therefore, filter feeding provides the initial contact point and opportunity for microsporidia–host interactions. Daphnia alone are known to be regularly infected by species from at least four of the seven major microsporidia clades, as well as a microsporidium branch near the root of the microsporidian phylogeny that has retained a mitochondrion (Ebert 2008; Haag et al. 2014).

In this comparative genomic study, we aim to better understand genomic evolution in microsporidia by elucidating the causes of variation in their genomic features, and the arrangement and epigenetic modifications of their genome. Only recently, and only for a few microsporidia genera, have genomic architecture and methylation patterns been described. For example, a recently-described genomic architecture for three species in the human-infecting genus Encephalitozoon found that each chromosome in the genomes of Encephalitozoon spp., from the ends to the core, consists of 5-mer telomeric repeats, telomere-associated repeat elements (TAREs), hypermethylated ribosomal RNA (rRNA) genes, less methylated subtelomeres, and a hypomethylated chromosome core (Mascarenhas dos Santos et al. 2023 ). Our aim is to evaluate the generality of this architecture by using high-quality genome resources to examine microsporidia from diverse clades of the phylum. We look at Encephalitozoon intestinalis, its close relative Ordospora colligata, Vairimorpha necatrix also of the Nosematida clade, Glugoides intestinalis of the sister clade Enterocytozoonida, Hamiltosporidium tvaerminnensis of the orphan clade, three microsporidia of the Amblyosporida clade (Binucleata daphniae, Gurleya vavrai, and Conglomerata obtusa [≡ Larssonia obtusa]), and one microsporidium with a mitochondrion, Mitosporidium daphniae. We further explore the processes that drive genome rearrangements and epigenetic modifications in these microsporidia. Previous studies have shown a relationship between mode of transmission and repeat content in the microsporidia genome (de Albuquerque et al. 2020; Haag et al. 2020), suggesting that, for example, life history contributes to the evolution of genomic architecture. As part of our study on genome organization and (epi)genetic features in these diverse microsporidia, we also contribute here the first high-quality genome assemblies and epigenetic annotations for a number of understudied microsporidia clades.

Results

Genome Assemblies

We generated genome assemblies for six microsporidia (Table 1), including first-time assemblies for B. daphniae, G. vavrai, and C. obtusa, as well as enhanced assemblies for M. daphniae, O. colligata, and G. intestinalis. To expand our taxon sampling, we also included available high-quality genomes of three other microsporidia: E. intestinalis (Mascarenhas dos Santos et al. 2023), H. tvaerminnensis (Angst et al. 2023), and V. necatrix (Svedberg et al. 2024) (Table 1). The overall length of the nine genome assemblies correlated negatively with their GC-content (F[1,7] = 19.44, P = 0.003, R2 = 0.74; Fig. 1), following the trend seen in all (draft) microsporidia reference genome assemblies available in the NCBI database (Fig. S1, Table S1). The Amblyosporida, H. tvaerminnensis, and V. necatrix genomes had about 15% lower GC-content (22% to 29%) and were about three times longer (10–22 Mb) than the smaller E. intestinalis, O. colligata, G. intestinalis, and M. daphniae genomes (34% to 43% and 3–6 Mb). The larger genomes had a higher repeat content (26% to 50%) and a higher number of protein-coding genes (2,847–4,231 bp) than the smaller genomes (12% to 20% and 1,768–2,840 bp), except for G. intestinalis, whose genome consisted of 43% repeats (positive correlation between assembly and repeat length: F[1,7] = 36.35, P = 0.001, R2 = 0.84). The protein-coding genes in larger genomes tended to be shorter on average (952–1,081 bp) than in shorter genomes (1,019–1,447 bp), except for H. tvaerminnensis (1,391 bp) whose protein-coding genes were closest to M. daphniae (1,447 bp) in average length.

Table 1.

Assembly information for the analyzed microsporidia genomes

Mitosporidium daphniae Gurleya vavrai Conglomerata obtusa Binucleata daphniae Hamiltosporidium tvaerminnensis Ordospora colligata Encephalitozoon intestinalis Vairimorpha necatrix Glugoides intestinalis
Total length (Mb) 5.83 16.37 18.13 9.58 21.64 3.18 2.61 15.01 5.13
GC-content (%) 42.68 21.97 25.59 29.04 26.6 37.53 41.94 28.34 34.06
Contig N50 (bp) 864,371 989,478 6,631 332,081 1,444,059 374,328 230,490 1,281,651 439,321
Contig number 10 20 5,653 31 17 9 11 12 13
BUSCO completeness (%) 37.6 90.3 88.5 85.7 94 99 99.5 96.8 83.8
Total length of repeats (Mb and %) 0.69
11.79
4.63
28.27
4.72
26.02
3.52
36.79
8.56
39.56
0.62
19.54
0.4
15.18
7.45
49.68
2.21
43.06
Number of protein- coding genes 2,840 3,537 4,231 2,847 3,573 1,768 2,074 2,971 2,331
Mean len. of protein- coding genes (bp) 1,447 978 952 1,049 1,391 1,114 1,019 1,081 1,127
Sequencing technology ONT plus
Illumina
ONT plus Illumina Illumina ONT plus Illumina PacBio CLR plus ONT PacBio HiFi ONT plus Illumina PacBio HiFi PacBio HiFi
Publication new new new new Angst et al. (2023) new Mascarenhas dos Santos et al. (2023) Svedberg et al. (2024) new

BUSCO completeness was calculated using the microsporidia specific microsporidia_odb10 database, which is not specific to microsporidia branches near the root of the microsporidian phylogeny like M. daphniae.

Fig. 1.

Fig. 1.

Relationship between genome length, total repeat length, and GC-content in nine microsporidia species. Assembly length (black dots and line) and total length of repeats (gray dots and line) correlated negatively with GC-content. Species names are displayed to the right of the plot at the corresponding GC-content.

Our improved O. colligata and G. intestinalis genome assemblies were more complete than previous assemblies, containing subtelomeric sequences for most chromosomes (Table S2). Our M. daphniae assembly was much more contiguous than the previous draft assembly (from 612 down to 10 contigs; Haag et al. 2014), but we only found one contig with a telomeric repeat (TTAGGG; Table S2). Our assembly of the M. daphniae linear mitochondrial genome was nearly identical to the previous assembly in the central, non-repetitive segment but captured more of the terminal inverted repeats (total length = 17,083 bp, including inverted repeats of about 2,030 bp at each end; previous total length = 14,043 bp, Haag et al. 2014). Phylogenetic inference confirmed previous rRNA gene phylogenies, placing our eight core microsporidia genomes into four different clades; M. daphniae was placed with the other microsporidium with a mitochondrion Paramicrosporidium saccamoebae (Fig. 2 and Fig. S2). Our first high-quality genome assemblies in the Amblyosporida clade were all from the Gurleyidae family. The maximum likelihood phylogeny derived from the concatenated gene sequences (Fig. 2) was similar to the weighted phylogeny derived from the individual maximum likelihood gene trees (Fig. S2), except at the known, unresolved positions in the microsporidia phylogeny (South et al. 2024).

Fig. 2.

Fig. 2.

Maximum likelihood phylogeny of the microsporidia with Rozella allomycis as an outgroup. In addition to the species discussed here (highlighted), we selected a representative sample of microsporidia species with available data for phylogenetic inference based on 15 single-copy orthologs. Node labels represent bootstrap values from 1,000 runs, and major microsporidia clades are delineated according to Bojko et al. (2022).

Genome Architecture and Synteny

Our reference and starting point for this comparison of microsporidia genome architecture was the well-described chromosome arrangement of Encephalitozoon (Mascarenhas dos Santos et al. 2023). We first compared E. intestinalis to its close relative O. colligata, finding them to have a similar genome architecture with a 5-mer telomeric repeat (TTAGG) followed by a large and a small rRNA gene each (= rRNA operon), which we found on both ends of six out of the nine contigs and on one end of each of the other three contigs (Fig. 3, Table S2). The next most closely related species, V. necatrix and G. intestinalis, had the same 5-mer telomeric repeat and a 4-mer telomeric repeat motif (TTAG), respectively. In G. intestinalis, rRNA genes followed the telomeric repeats on both ends of four contigs, on one end of seven contigs, and were absent in two contigs with telomeric repeats, indicating variation in chromosomal arrangement. The V. necatrix genome was more repetitive than its closest relatives, with more repeats between telomeres and rRNA genes, and with variation in their presence and number. In the other major clades containing Amblyosporida and H. tvaerminnensis, the genome architecture differed even more widely. The rRNA operons were not bound to the subtelomeres and, in G. vavrai and H. tvaerminnensis, occurred in tandem with tail-to-tail orientation. Hamiltosporidium tvaerminnensis had ten rRNA operons at five locations, and G. vavrai had 14 copies at seven locations. Each pair of juxtaposed rRNA operons surrounded specific LINE repeats that were predominantly found there. Binucleata daphniae and G. vavrai were found to contain the previously described telomeric repeat motif of H. tvaerminnensis (TTAGGG; Angst et al. [2023]). From the less contiguous genome assemblies of B. daphniae and C. obtusa, we could recover only a few rRNA genes from the former and no rRNA genes nor telomeric repeats from the latter.

Fig. 3.

Fig. 3.

Summary of (epi)genetic features across species. Each row in the maximum likelihood phylogeny corresponds to a species and its associated (epi)genetic features; NA indicates that data is not available. The phylogeny is based on 155 single-copy orthologs and node labels represent bootstrap values from 1,000 runs.

Overall, we found limited concordance in sequence arrangement (or synteny) across the analyzed microsporidia unless they were closely related (Fig. 4). For example, although single-copy BUSCO genes, which are preserved due to their importance for organismal function and survival, occurred in conserved linkages across closely related species, only a few BUSCO linkage groups were shared by all core microsporidia. This was as expected due to the high degree of species divergence. Throughout the phylogeny, we found no centromeric regions based on chromosome-wide GC or repeat content, supporting hypotheses that microsporidian chromosomes might be holocentric, or their centromeres might be epigenetically defined (Mascarenhas dos Santos et al. 2023; Khalaf et al. 2024).

Fig. 4.

Fig. 4.

Ribbon diagram with maximum likelihood phylogeny showing synteny limited to closely related microsporidia. Contigs, some of which represent complete chromosomes, are represented by horizontal bars and connected by ribbons that connect the location of individual single-copy BUSCO genes across assemblies. Bold-type ribbons represent statistically conserved linkages. Because of its lower assembly contiguity, Conglomerata obtusa was omitted from the analysis.

Repeat Landscapes

Encephalitozoon spp. have few to no repeats except at chromosome ends, which are all highly similar (Mascarenhas dos Santos et al. 2023). However, as seen in Fig. 1, the genomes analyzed here showed high variation in repeat content, suggesting that repeats may play a role in shaping the evolution of microsporidian genomes. Using the Kimura substitution level, which estimates divergence from a repeat consensus sequence, we assessed repeat content and its activity over evolutionary time. High substitution levels indicate high divergence related to activity deep in the evolutionary past, and low substitution levels indicate low divergence and therefore recent activity. The largest genomes showed high repeat content and constant repeat activities throughout their past (Fig. 5). The three Amblyosporida species shared similar repeat activity, especially in their early past, indicative of the activity in a common ancestor. DNA and LTR transposons were highly active in all three species, but G. vavrai, and especially B. daphniae, showed higher activity in the recent past. The apparent lack of recent activity in C. obtusa might be an artifact of the low divergence repeats collapsed in this short-read assembly. Recent increases in repeats compared to earlier activity were more pronounced in species with smaller genomes, such as LINEs in G. intestinalis. These observations all indicate that different transposon families could shape the larger genomes. As species with shorter genomes had less activity inferred in their more distant past, repeats may have only recently been expanded or removed in the shorter genomes.

Fig. 5.

Fig. 5.

Repeat landscapes of microsporidia. Low Kimura substitution levels represent low divergence from the repeat consensus sequence, corresponding to recently expanded sequences, whereas high substitution levels indicate high divergence, indicating earlier activity. Genome assembly lengths are indicated in brackets.

Methylation

Epigenetic modifications may act to regulate not only gene expression but also repeat element activity and the associated variation in genome architecture. Encephalitozoon has hypermethylated chromosome ends with rRNA genes and hypomethylated chromosome cores with mRNA genes (Mascarenhas dos Santos et al. 2023). Using a genome ontology approach based on hypergeometric tests, we compared these methylation patterns with the genomes for which we have epigenetic information. From our species with available ONT data for methylation analysis, we found that the closest Encephalitozoon relative, G. intestinalis, showed the most similar pattern: hypermethylated rRNA genes at the chromosome ends and hypomethylated mRNA genes at the chromosome cores (Table 2 and Table S3). Unlike Encephalitozoon, however, G. intestinalis had LTRs and hypermethylated LINE repeats. In the other species with available ONT data, H. tvaerminnensis, G. vavrai, and B. daphniae also had hypermethylated rRNA genes, albeit not statistically significant in H. tvaerminnensis, and tended to have hypermethylated transposable repeat elements like DNA, LINE, and LTR, even though their genome architectures were very different than Encephalitozoon and G. intestinalis. Non-transposable repeats, like simple repeats and low-complexity regions, were hypomethylated in all species. Coding sequences were hypomethylated in the H. tvaerminnensis genome, like in Encephalitozoon and G. intestinalis, but were hypermethylated in G. vavrai and B. daphniae. Differential methylation of coding sequences and repeat elements could indicate gene expression and repeat activity regulation, respectively. The two repeat elements that expanded most recently (LINEs in G. intestinalis and LTRs in B. daphniae) were hypermethylated, but only statistically significant in G. intestinalis.

Table 2.

Methylation patterns of microsporidia

Gurleya vavrai Binucleata daphniae Hamiltosporidium tvaerminnensis Encephalitozoon intestinalis Glugoides intestinalis
DNA repeat −21.86 *** 4.41 * −202.4 ***
LINE repeat 1.46 NS 5.06 ** −11.48 *** −136.83 ***
Low complexity 21.08 *** 4.44 * 13.75 *** 2.90 NS 1 NS
LTR −91.96 *** −1.91 NS −7.53 *** −1.58 NS
mRNA −4.91 ** −5.39 ** 19.55 *** 1,476.72 *** 77.31 ***
RC repeat −1.25 NS
rRNA −35.97 *** −72.86 *** −1.05 NS −2,936.43 *** −107.06 ***
Satellite repeat −2.73 NS
Simple repeat 62.19 *** 14.72 *** 31.30 *** 17.22 *** 7.99 ***
Unknown repeat −1.9 NS −17.66 *** −8 *** −163.04 *** −200.41 ***

Overrepresented (= hypermethylated) genomic features have a negative log P-value, whereas underrepresented (= hypomethylated) genomic features have a positive log P-value. Asterisks denote levels of statistical significance obtained from genome ontology. Specifically, *P < 0.05, **P < 0.01, ***P < 0.001, and NS = not significant.

Present and Absent Protein Domains

Genome streamlining in microsporidia has resulted in the loss of protein domains that are considered essential in non-parasitic free-living fungi (Haag et al. 2014; Jespersen et al. 2022). Using comparative genomics, we identified present and absent metabolic and functional capacities across microsporidia, looking at which functions were retained in all or most species and which were reduced. We began by summarizing the presence and absence of protein domains (Pfam)–characteristic protein families for biological pathways (Table S4). Among the nine microsporidia genomes, M. daphniae had, as expected, the most exclusive protein domains (656), even more than the shared protein domains across all nine species (444). These included mitochondrial carrier proteins (PF00153) that are characteristic of the transfer of molecules across mitochondrial membranes (M. daphniae has a mitochondrion) but that were absent in all mitochondria-free canonical microsporidia (Fig. 6). Asparyl protease domains, e.g. PF13650, were among the most abundant protein domains with presence/absence polymorphism among the nine microsporidia (Fig. 6). These endopeptidases are known virulence factors of pathogenic fungi (Mandujano-González et al. 2016). Looking at all species used in our phylogeny (Fig. 2), we found that the functional glycolysis characteristic Fructose-bisphosphate aldolase class-I family (PF00274) was absent in Enterocytozoon bieneusi (Table S4), consistent with previous findings that glycolysis is reduced in this species that obtains its energy directly from the host (Wiredu Boakye et al. 2017). Protein domains like ABC transporter (PF00005), Hsp70 (PF00012), and actin (PF00022) were present in all species (Table S4) and thus appear to be essential for microsporidia.

Fig. 6.

Fig. 6.

Variably present protein domains (Pfam) ordered from most to least abundant. Rows list individual protein domains with their Pfam domain description and ID (in parentheses), while columns with a maximum likelihood phylogeny at the top list the nine species. The penultimate column tallies the number of protein domains present, while the last column provides the proportion of species in which they occur. Full data are presented in Table S4.

Discussion

Microsporidia are intracellular parasites that are enigmatic due to their high degree of specialization. Our goal was to better understand the evolution of their genome by producing several new high-quality genome assemblies that reveal the (epi)genetic architecture across this taxon. We focused mainly on defining which features are general for all microsporidia and which are derived in subclades. GC-content, genome length, chromosomal sequence arrangement, and methylation patterns vary widely among microsporidian clades; however, species that share a more recent common ancestor tend to exhibit less variation in these features, similar to other systems with high-quality genomes available for study. The architecture of microsporidia genomes studied here was only similar to the previously described Encephalitozoon in one close relative and in a species of the next closest major microsporidian clade, proving that Encephalitozoon's chromosomal sequence arrangement is not general to all microsporidia and challenging previous speculation of conservation of genome architecture across the microsporidia (Brugère et al. 2000). Our other studied species showed very diverse chromosomal arrangements and repeat contents, although the same genetic features tend to be hyper- or hypomethylated in all species. For example, rRNA gene positions are different among all microsporidian clades, but hypermethylated in most species. Because the new and improved microsporidia genome assemblies presented here are all from species that infect the aquatic crustacean Daphnia, we can see that the variation in genome architecture is not explained by the host: indeed, the host's role in parasite genome architecture appears to be minor. As models, thus, Daphnia and its microsporidian parasites from across the phylum allow us to extend microsporidia research and improve our understanding of genome evolution in these extreme parasites beyond the mammal-parasite model.

Nucleotide Composition Linked to Genome Size

Nucleotide composition is fundamental to genome architecture, and although GC-content varies widely across all lifeforms, it tends to evolve to lower values in parasites such as the microsporidia. Reduced recombination rates, diminished DNA repair mechanisms, relaxed purifying selection, and increased selection for AT mutations have all been proposed as reasons for reduced GC in parasite genomes (Videvall 2018). These mechanisms may play a role in the negative correlation observed here between GC-content and repeat content, which is also seen in fungi and other related phyla (Elliott and Gregory 2015). First, it might be easier for repeat elements to proliferate in species with relaxed purifying selection due to the small effective population size relative to the mode of transmission (Haag et al. 2020). We expect species with a small effective population size to have more and more recent expansions of repeat elements due to vertical transmission compared to species with a higher effective population size and horizontal-only transmission (de Albuquerque et al. 2020). With our sample, it is hard to draw conclusions on demographic effects; however, long genome Amblyosporida and Nosema/Vairimorpha clade species are known for complex life cycles with mixed-mode transmission (Vávra et al. 2018; Xiong et al. 2023), and mixed-mode transmission in the large genome microsporidium H. tvaerminnensis has been associated with small effective population sizes (Haag et al. 2020). Contrarily, species with small genomes are only transmitted horizontally. Second, AT mutations might accumulate in proliferating repeat elements, directly and indirectly accelerating the decrease in GC-content. This occurs indirectly if 5mC methylation is used to slow the proliferation of repeat elements because fewer positions are available for this epigenetic modification. In contrast, GC-richer repeats might be more of a methylation target, possibly constraining their proliferation. Although repeat elements were hypermethylated across species, repeat activity was more evenly distributed across time in lower GC species than in higher GC species, where activity occurred mainly in the recent past. This could be because higher GC species have more stringent selection against repeat proliferation, more efficient mechanisms to remove repeats, higher recombination rates, or lower tendency for AT mutation, such that our methodology estimates higher recent repeat activity because of the lower divergence among them.

Distribution of rRNA Genes

The rRNA genes are of special interest in microsporidia genome architecture, mainly because of their unusual location in Encephalitozoon close to each chromosome end, at the subtelomeres (Mascarenhas dos Santos et al. 2023). Contrary to previous findings from a short-read-based draft genome (Pombert et al. 2015; Mascarenhas dos Santos et al. 2023), we show that O. colligata shares this structure with its close Encephalitozoon relatives. This highlights the use of high-quality resources to study genome architecture. rRNA genes are also located at some, but not all, of the subtelomeres in a species of the next closely related major clade, G. intestinalis, which also shows small-scale rRNA gene rearrangements (Refardt and Mouton 2007). Given that transposable elements (TEs) show high and recent activity in G. intestinalis, that rRNA genes and TEs are co-located, and that TEs interact with and benefit from each other (Garcia et al. 2024), we suggest that rRNA genes and TEs may coevolve together in this microsporidium or even across microsporidia, a hypothesis supported also by our finding that rRNA genes and TEs are co-located in H. tvaerminnensis and Amblyosporida microsporidia. Moreover, rRNA genes in these species have yet another outstanding arrangement: rRNA operons occur in tandem with tail-to-tail orientation and few LINE or unknown repeats between them. These rRNA operon pairs are not located close to a chromosome end, however, suggesting that the proximity of rRNA genes to chromosome ends is a derived characteristic found in some species of the Nosematida and Enterozytozoonida clades.

Co-Occurrence and Epigenetic Modification of Repeat Elements

The significance of methylation is unclear in microsporidia. Mascarenhas dos Santos et al. (2023) have speculated that methylation at rRNA genes facilitates heterochromatin formation, helping to silence the ribosomal machinery during phases of the life cycle that have low or no access to energy. Indeed, we found rRNA genes to be hypermethylated in most the species studied, regardless of their location. The hypothesis proposed by Mascarenhas dos Santos et al. (2023) could also apply to other genetic loci, such as TEs, which because their activity may be unfavorable, are silenced by epigenetic modifications that facilitate heterochromatin formation. Whereas small microsporidia genomes have low to no repeats, including TEs, and hypomethylated cores, larger genomes might show methylation at repeat loci across the genome to reduce TE activity through heterochromatin formation. Given that rRNA genes and transposable elements co-occur in the studied microsporidia, hypermethylation could help to simultaneously control their activity and spread. At the same time, their co-occurrence might be beneficial to the TEs, which might be more likely to proliferate or less likely to be removed if they occur close to the rRNA genes, which are highly expressed. The microsporidia V. necatrix and Nosema muscidifuracis from the same clade as Encephalitozoon and Ordospora have more randomly distributed rRNA operons (and N. muscidifuracis has many more), while both have much lower whole-genome GC-content than their relatives (Xiong et al. 2023). A telomere-to-telomere assembly of the N. muscidifuracis genome would help pin down rRNA gene location and methylation data for the two species to clarify the role of epigenetic modification in the evolution of microsporidia repeat families. Neither epigenetic, nucleotide, nor sequence composition has indicated the location of centromeres in the species studied here. Hi-C sequencing could help identify centromeres (Khalaf et al. 2024).

Conclusion

Microsporidia have attracted much attention because of their parasitic life history and because they include some of the smallest known eukaryote genomes. However, their genomes also exhibit drastic variation in size and other features. Here, we present high-quality genome assemblies from across the major microsporidia clades, showing covariance in GC-content, chromosomal sequence arrangement, and epigenetic modification. Species with high GC-content have a sequence arrangement similar to Encephailtozoon spp., with hypermethylated subtelomeres and hypomethylated cores. In contrast, low-GC species do not share this pattern, having instead a higher activity of repeat elements, which have lower GC-content than coding sequences, likely because the latter are under purifying selection (Videvall 2018). Low-GC genomes might arise due to a lowered selection regime that allow transposable repeats to proliferate more easily (Videvall 2018; de Albuquerque et al. 2020). The evolutionary dynamics leading to the discrepancy between relatively strong selection on coding sequences and weak selection outside of them remain unknown. Still, the activity of repeat elements clearly shapes the evolution of genome architecture, and with the recent advent of high-throughput sequencing technologies should thus be included in comparative studies of GC-content where it was previously not.

Although the streamlining of genomes has led to significantly fewer coding sequences across all core microsporidia (Žárský et al. 2023), certain species show extraordinary genome reductions, like the partial absence of glycolysis genes in Enterocytozoon spp. (Wiredu Boakye et al. 2017). Here, we provide an overview of the presence and absence of protein domains across microsporidia as a resource for studies focused on the presence/absence polymorphism and functional adaptations to parasitism across the microsporidia clades. Knowing the phylogenetic distribution of protein domains and other genomic features in the microsporidia allows us to dive deeper into the evolution of this extraordinary group of parasites. Features that are conserved by all microsporidia may have contributed to the radiation of this group, with the loss of typical mitochondria being the hallmark example. In contrast, features present only in individual microsporidian clades or species may have been important in evolving adaptations to specific hosts or life histories. Furthermore, correlations between the presence of certain genomic features and life history traits can shed light on the evolutionary drivers of these traits. Thus, the presence or absence of protein domains and other genomic features can be the starting point for in-depth comparative study of the microsporidia and their genomic architecture with these high-quality genomes. As a final step for obtaining telomere-to-telomere assemblies, especially the harder-to-assemble longer genomes, we suggest continuing to integrate Hi-C sequencing into microsporidian genomics (Khalaf et al. 2025).

Methods

Samples

Using long wide-mouth pipettes, we collected Daphnia pulex infected with C. obtusa from the Tvaerminne archipelago, Finland (59°49′55.4″, N 23°15′17.6″ E), placed them in RNAlater (Ambion, Glasgow, United Kingdom), and kept them refrigerated until DNA isolation. Daphnia pulex infected with G. vavrai were collected from Aegelsee lake near Frauenfeld, Switzerland (47°33′28.0″ N, 8°51′46.0″ E) and were transported to the laboratory for DNA isolation. Daphnia magna infected with B. daphniae (ID: BE-OM-3), G. intestinalis (KZ-23-1), M. daphniae (CH-H-2014-2299), and O. colligata (GB-LK1-1) were previously collected and kept in the lab as iso-female lines (Refardt et al. 2008; Pombert et al. 2015) until DNA isolation. We obtained high-molecular-weight DNA from Daphnia with parasites using the GenePure DNA Isolation Kit (QIAGEN, Hilden, Germany) as described in Angst et al. (2025) (https://dx.doi.org/10.17504/protocols.io.5jyl82n96l2w/v1; Angst and Fields (2024)). Illumina paired-end sequencing and either PacBio HiFi (Pacific Biosciences high fidelity) or ONT (Oxford Nanopore Technologies) sequencing was used to obtain the highest quality nucleotide or modification calls, respectively. Illumina libraries were prepared using Kapa PCR-free kits and sequenced by the Quantitative Genomics Facility service platform at the Department of Biosystem Science and Engineering (D-BSSE, ETH) Basel, Switzerland, on an Illumina HiSeq 6000 sequencer. PacBio SMRTbell libraries were prepared and sequenced on a PacBio Revio at the Lausanne Genomics Technologies Facility (GTF, UNIL), Switzerland. We prepared ONT libraries with the SQK-LSK110 ligation kit and sequenced them using a MinION device with a Spot-ON Flow Cell (R9.4.1). Additionally, we downloaded genomic data of E. intestinalis (NCBI database; Assembly name: ASM2439929v1; GenBank assembly accession: GCA_024399295.1; SRA accession: SRR17865590; Bioproject accession: PRJNA594722; Mascarenhas dos Santos et al. 2023) H. tvaerminnensis (NCBI database; Assembly name: FIOER33 v3; GenBank assembly accession: GCA_022605425.2; SRA accession: SRR24575619, Bioproject accession: PRJNA778105; Angst et al. 2023), and V. necatrix (NCBI database; Assembly name: ASM3663032v1; GenBank assembly accession: GCF_036630325.1; Bioproject accession: PRJNA909071; Svedberg et al. 2024) for analysis.

Assembly

Glugoides intestinalis and O. colligata

Glugoides intestinalis (total/average read length: 79.38 Gb/10.75 Kb) and O. colligata (20.22 Gb/8.84 Kb) were sequenced using PacBio HiFi sequencing to an average sequencing coverage of 55× and 45×, respectively. We used hifiasm v.0.19.8-r603 (Cheng et al. 2021) to assemble the obtained sequencing reads with default parameters for G. intestinalis and with –hom-cov 45 -l1 –n-hap 1 parameters for O. colligata. Additionally, we generated a second hifiasm assembly for both species after excluding host-specific PacBio HiFi reads using SAMtools v.1.7 (Danecek et al. 2021) that were identified by mapping them to the D. magna genome (NCBI database; Assembly name: ASM4014379v1; GenBank assembly accession: GCA_040143795.1; BioProject ID: PRJNA624896; Cornetti et al. 2024) using minimap2 v.2.20-r1061 (Li, 2018). For each species, we aligned the two assemblies using minimap2 and merged them manually. We assessed the quality and completeness of the assemblies using QUAST v.5.0.2 (Gurevich et al. 2013) and BUSCO v.5.5.0 (Manni et al. 2021) with the microsporidia_odb10 database (Creation date: 2020-08-05). All software was used with default parameters unless stated otherwise.

Binucleata daphniae and G. vavrai

We sequenced B. daphniae (total/N50: 0.72 Gb/7.21 Kb) and G. vavrai (3.96 Gb/3.53 Kb) using ONT sequencing in adaptive mode, depleting for sequences homologous to the D. magna and the D. pulex genome (NCBI database; Assembly name: PA42 4.2; GenBank assembly accession: GCA_911175335.1; BioProject ID: PRJEB46221; Ye et al. 2021), respectively. We used bonito v.0.5.3 (https://github.com/nanoporetech/bonito) to base-call ONT sequencing reads with the dna_r9.4.1_e8.1_sup@v3.3 model. For the B. daphniae genome assembly, we used MaSuRCA v.4.0.5 (Zimin et al. 2017) with ONT (53×) and Illumina (639×) sequencing reads, excluding host-specific, adaptor, and low-quality sequences using minimap2, bwa-mem2 mem v.2.2.1 (Vasimuddin et al. 2019), SAMtools, and fastp v.0.23.2 (Chen 2023) with -q 30 and -l 100 parameters. Haplotigs were purged twice from the assembly using the Purge Haplotigs pipeline v.1.1.2 (Roach et al. 2018) with -l 125 -m 220 -h 400 parameters and BEDTools v.2.30.0 (Quinlan 2014) for generating the read-depth distribution. For the G. vavrai genome assembly, we used nextDenovo v.2.5.0 (Hu et al. 2024) with ONT sequencing reads (98×) and polished the resulting assembly with Illumina sequencing reads (458×) using nextPolish v.1.4.1 (Hu et al. 2020), Medaka v.1.7.2 (https://github.com/nanoporetech/medaka) and Pilon v.1.24 (Walker et al. 2014). The Purge Haplotigs pipeline was applied with -l 88 -m 176 -h 300 parameters.

Mitosporidium daphniae

We excluded D. magna reads from the newly generated Nanopore sequencing data (total/N50: 7.75 Gb/5.84 Kb) and the previously generated PacBio (Clone ID: CH-H-2299; Dexter et al. Under Review) and Illumina (Clone IDs: CH_H_2018_f_44, CH_H_2018_f_64, CH_H_2018_i_100, MS_2016_h_54, MS_2016_i_23, MS_2016_j_39; NCBI database, SRA accessions: SRR31898418-SRR31898423, Bioproject ID: PRJNA1206819) reads, all from clones from Aegelsee lake, Switzerland (47°33′28.0″ N, 8°51′46.0″ E), using minimap2, bwa-mem2 mem, and SAMtools. Because the long-read average sequence coverage of the parasite genome was 10×, we generated a hybrid assembly of the long reads with the short reads (139×) using MaSuRCA. To identify microsporidian contigs in the obtained assembly, we ran BLAST homology searches using runTaxonomizedBLAST.pl v.0.3 (Mascarenhas dos Santos et al. 2023). We retained contigs if they met several criteria, i.e. length >1 Kbp, average GC content within 3 SDs from the mean GC of all identified contigs with homology to microsporidia (between 0.38 and 0.46), and average Illumina sequencing coverage within the mean coverage of all contigs with identified homology to microsporidia ± 30× (between 105× and 165×). We removed contigs with coverage over 1× for a parasite-free D. magna sample (NCBI database; SRA accession: SRS3581327, Bioproject ID: PRJNA480405; Fields et al. (2018)) and with homology to species other than microsporidia using BBMap v.38.96 (Bushnell 2014) and seqtk v.1.3 (https://github.com/lh3/seqtk).

We separately assembled the mitochondrial genome of M. daphniae by using long reads neither mapping to the D. magna nor to the M. daphniae nuclear genome but mapping to the previously reported M. daphniae mtDNA assembly (NCBI database; GenBank accession: MW864067.1; Haag et al. 2014 ) with flye v.2.9.2 (Kolmogorov et al. 2019) and a genome size estimate of 15 Kbp. We annotated the mitochondrial genome using MFannot (Lang et al. 2023).

Conglomerata obtusa

Because we were unable to generate sufficient quality genomic material to conduct long-read sequencing for C. obtusa, we used Illumina sequencing (153×) for this case only. We excluded sequencing reads mapping to the D. pulex host genome and assembled the remaining reads using megahit v.1.2.9 (Li et al. 2016). We excluded contigs with less than a third of the average coverage, shorter than 500 basepairs, with GC-content above 33, or with coverage over 1× for a Daphnia sample from a parasite-free pond of the same island (NCBI database; SRA accession: SRR26394761, Bioproject ID: PRJNA862292; Angst et al. 2024) using BBMap and seqtk.

Genome Annotation and Phylogenetic Inference

We annotated de novo genome assemblies using funannotate v.1.8.14 (Palmer and Stajich 2022) and associated software for gene prediction and functional annotation (Käll et al. 2004; Korf 2004; Majoros et al. 2004; Stanke et al. 2006, 2008; Haas et al. 2008; Frith 2011; Jones et al. 2014; Almagro Armenteros et al. 2019; Huerta-Cepas et al. 2019; Seppey et al. 2019; Brůna et al. 2020; Cantalapiedra et al. 2021). Before running funannotate on M. daphniae, we trained AUGUSTUS using BRAKER v.2.1.6 (Brůna et al. 2021) in the fungus mode, ProtHint v.2.6.0 (Brůna et al. 2020) and the fungal ortholog database (Kriventseva et al. 2019), which includes the microsporidia. When running funannotate, we used the built-in dikarya and microsporidia BUSCO gene sets (Seppey et al. 2019), and the pretrained AUGUSTUS model of the microsporidium Encephalitozoon cuniculi, and supplied transcript evidence from Rozella allomycis (James et al. 2013) and several microsporidia (AalgeraePRA109, AalgeraePRA339, EaedisUSNM41457, EcuniculiGBM1, MdaphniaeUGP3, NausubeliERTm2, NausubeliERTm6, NbombycisCQ1, NceranaeBRL01, NparisiiERTm1, NparisiiERTm3, NparisiiERTm3, OcolligataOC4, PneurophiliaMK1, Slophii42_110, ThominisUnknown, VcorneaeATCC50505, Vculicisfloridensis; Aurrecoechea et al. (2017)) from the EuPathDB database. To predict protein functions, we used the combined evidence from InterProScan v.5.55_88.0 (Jones et al. 2014), eggNOG-mapper v.2.1.9 (Huerta-Cepas et al. 2019; Cantalapiedra et al. 2021), Phobius v.1.01 (Käll et al. 2004), SignalP v.5.0b (Almagro Armenteros et al. 2019), and antiSMASH v.6.0 (Blin et al. 2021). For phylogenetic inference, we downloaded genomes from a broad range of microsporidia (Aurrecoechea et al. 2017) including P. saccamoebae (Quandt et al. 2017), which represents a branch near the root of the microsporidian phylogeny that has retained a mitochondrion, along with Rozella allomycis (James et al. 2013) as an outgroup. We identified single-copy orthologs among the species using proteinortho v.6.3.0 (Klemm et al. 2023), aligned the individual genes using MAFFT v.7.508 (Katoh et al. 2002; Katoh and Standley 2013), trimmed the alignments using trimAl v.1.4.rev15 (Capella-Gutiérrez et al. 2009) with the -automated1 flag, and concatenated them. We inferred a maximum likelihood species tree from the concatenated sequences using IQ-TREE 2 v.2.2.0 (Minh et al. 2020) and a weighted phylogeny from the individual maximum likelihood gene trees using Weighted ASTRAL v.1.19.3.7 (Zhang and Mirarab 2022) with -r 16 -s 16 -x 100 -n 0 parameters separately for all species and the nine main study species. For the former alignment (3,915 bp), ModelFinder in IQ-TREE 2 identified the LG + F + R5 model as the best-fitting model, and for the latter (56,715 bp), the LG + F + R4 model.

Comparative Genomics and Genome Ontology

We looked for homology and synteny across genomes using odp v.0.3.3 (Schultz et al. 2023). Specifically, we used the coordinates of the identified BUSCO genes to infer ancestral BUSCO gene linkage groups, test for conserved linkages among genomes, and create the ribbon diagram. To obtain modified base calls from ONT sequencing reads, we used megalodon v.2.5.0 (https://nanoporetech.github.io/megalodon) focusing on 5mC modifications for downstream analyses. We tested whether genomic annotations, including mRNA genes, rRNA genes, and different repeat types, were enriched with methylation sites by using the genome ontology peak annotation enrichment in homer v.5.1 (Heinz et al. 2010). This analysis considered sites to be methylated if read depth was above 3× and percent modified reads over 50%. The inclusion of multiple cell stages in our DNA isolation protocol explains, at least in part, variation in epigenetic modification between sequencing reads. For example, spores might have a different methylation pattern than sporonts, meronts, or other life stages (Mascarenhas dos Santos et al. 2023). For G. intestinalis, we used available ONT reads (NCBI database; SRA accession: SRR31799445, Bioproject accession: PRJNA1199805). We identified repeats using RepeatModeler v.2.0.2, including the LTR pipeline (Flynn et al. 2020), and ReapeatMasker v.4.1.2 (Smit et al. 2013) and separately identified rRNA genes using cmscan of the Infernal pipeline v.1.1.2 (Nawrocki and Eddy 2013). From the ReapeatMasker alignments, we created repeat landscapes using ReapeatMasker utils calcDivergenceFromAlign.pl and createRepeatLandscape.pl. Finally, we summarized PFAM domains to identify the presence and absence of protein domains across all species using the module compare of funannotate, and the tidyverse 2.0.0 (Wickham et al. 2019) R v.4.4.1 (R Core Team 2024) package for representation and statistics.

Supplementary Material

evaf166_Supplementary_Data

Acknowledgments

We thank Jürgen Hottinger, Michelle Krebs, and Urs Stiefel for their help in the laboratory and all members of the Ebert group for providing feedback on the study and the manuscript. Alix Thivolle extracted DNA from D. pulex infected with C. obtusa. Christina Tadiri collected D. pulex infected with G. vavrai. Calculations were performed at sciCORE (http://scicore.unibas.ch/) scientific computing center at the University of Basel. Suzanne Zweizig improved the language of the text.

Contributor Information

Pascal Angst, Department of Environmental Sciences, Zoology, University of Basel, Basel 4051, Switzerland.

Dieter Ebert, Department of Environmental Sciences, Zoology, University of Basel, Basel 4051, Switzerland.

Peter D Fields, Department of Environmental Sciences, Zoology, University of Basel, Basel 4051, Switzerland.

Supplementary Material

Supplementary material is available at Genome Biology and Evolution online.

Funding

This work was supported by the Swiss National Science Foundation (SNSF) (grant numbers 310030_188887 and 310030_219529 to D.E.).

Data Availability

Raw data are deposited in the NCBI SRA database, the assembled genome plus the predicted protein sequences are available in the NCBI GenBank (BioProject IDs PRJNA1199946, PRJNA1199805, and PRJNA1206819), and the trimmed alignments are deposited at https://doi.org/10.6084/m9.figshare.29715890.

Literature Cited

  1. Almagro Armenteros  JJ, et al.  Signalp 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol.  2019:37:420–423. 10.1038/s41587-019-0036-z. [DOI] [PubMed] [Google Scholar]
  2. Angst  P, Fields  PD. DNA-extraction of Daphnia and symbionts. Protocols.Io. 2024. 10.17504/protocols.io.5jyl82n96l2w/v1. [DOI]
  3. Angst  P, Haag  CR, Ben-Ami  F, Fields  PD, Ebert  D. Genome-wide allele frequency changes reveal that dynamic metapopulations evolve differently. Mol Biol Evol.  2024:41:msae128. 10.1093/molbev/msae128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Angst  P, Pombert  J-F, Ebert  D, Fields  PD. Near chromosome–level genome assembly of the microsporidium Hamiltosporidium tvaerminnensis. G3 (Bethesda). 2023:13:jkad185. 10.1093/g3journal/jkad185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Angst  P, Thivolle  A, Haden  Z, Wale  N, Ebert  D. Genomic analysis of the zooplankton-associated pathogenic bacterium Spirobacillus cienkowskii reveals its functional and metabolic capacities. Microb Genom.  2025:11:001463. 10.1099/mgen.0.001463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Aurrecoechea  C, et al.  EuPathDB: the eukaryotic pathogen genomics database resource. Nucleic Acids Res.  2017:45:D581–D591. 10.1093/nar/gkw1105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Blin  K, et al.  antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res.  2021:49:W29–W35. 10.1093/nar/gkab335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bojko  J, et al.  Microsporidia: a new taxonomic, evolutionary, and ecological synthesis. Trends Parasitol.  2022:38:642–659. 10.1016/j.pt.2022.05.007. [DOI] [PubMed] [Google Scholar]
  9. Bojko  J, Stentiford  GD. Microsporidian pathogens of aquatic animals. In: Weiss  LM, Reinke  AW, editors. Microsporidia: current advances in biology. Springer International Publishing; 2022. p. 247–283. 10.1007/978-3-030-93306-7_10. [DOI] [PubMed] [Google Scholar]
  10. Brugère  J-F, Cornillot  E, Méténier  G, Bensimon  A, Vivarès  CP. Encephalitozoon cuniculi (Microspora) genome: physical map and evidence for telomere-associated rDNA units on all chromosomes. Nucleic Acids Res.  2000:28:2026–2033. 10.1093/nar/28.10.2026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Brůna  T, Hoff  KJ, Lomsadze  A, Stanke  M, Borodovsky  M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform.  2021:3:lqaa108. 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Brůna  T, Lomsadze  A, Borodovsky  M. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom Bioinform.  2020:2:lqaa026. 10.1093/nargab/lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bushnell  B. BBMap: a fast, accurate, splice-aware aligner (No. LBNL-7065E). Lawrence Berkeley National Lab. (LBNL); 2014. https://www.osti.gov/biblio/1241166-bbmap-fast-accurate-splice-aware-aligner [Google Scholar]
  14. Cantalapiedra  CP, Hernández-Plaza  A, Letunic  I, Bork  P, Huerta-Cepas  J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol.  2021:38:5825–5829. 10.1093/molbev/msab293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Capella-Gutiérrez  S, Silla-Martínez  JM, Gabaldón  T. Trimal: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009:25:1972–1973. 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chen  S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta. 2023:2:e107. 10.1002/imt2.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Cheng  H, Concepcion  GT, Feng  X, Zhang  H, Li  H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods.  2021:18:170–175. 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cornetti  L, Fields  PD, Du Pasquier  L, Ebert  D. Long-term balancing selection for pathogen resistance maintains trans-species polymorphisms in a planktonic crustacean. Nat Commun.  2024:15:5333. 10.1038/s41467-024-49726-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Corradi  N, Slamovits  CH. The intriguing nature of microsporidian genomes. Brief Funct Genomics.  2011:10:115–124. 10.1093/bfgp/elq032. [DOI] [PubMed] [Google Scholar]
  20. Danecek  P, et al.  Twelve years of SAMtools and BCFtools. GigaScience. 2021:10:giab008. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. de Albuquerque  NRM, Ebert  D, Haag  KL. Transposable element abundance correlates with mode of transmission in microsporidian parasites. Mob DNA.  2020:11:19. 10.1186/s13100-020-00218-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Dexter  E, et al.  Rapid evolution of a large highly-divergent haplotype region during a bacterial epidemic.  Under Review.
  23. Ebert  D. Host–parasite coevolution: insights from the Daphnia–parasite model system. Curr Opin Microbiol.  2008:11:290–301. 10.1016/j.mib.2008.05.012. [DOI] [PubMed] [Google Scholar]
  24. Elliott  TA, Gregory  TR. What's in a genome? The C-value enigma and the evolution of eukaryotic genome content. Philos. Trans. R. Soc. B, Biol. Sci.  2015:370:20140331. 10.1098/rstb.2014.0331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Fields  PD, et al.  Mitogenome phylogeographic analysis of a planktonic crustacean. Mol Phylogenet Evol.  2018:129:138–148. 10.1016/j.ympev.2018.06.028. [DOI] [PubMed] [Google Scholar]
  26. Flynn  JM, et al.  RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020:117:9451–9457. 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Frith  MC. A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Res.  2011:39:e23. 10.1093/nar/gkq1212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Garcia  S, et al.  The dynamic interplay between ribosomal DNA and transposable elements: a perspective from genomics and cytogenetics. Mol Biol Evol.  2024:41:msae025. 10.1093/molbev/msae025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Gurevich  A, Saveliev  V, Vyahhi  N, Tesler  G. QUAST: quality assessment tool for genome assemblies. Bioinformatics (Oxford, England). 2013:29:1072–1075. 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Haag  KL, et al.  Evolution of a morphological novelty occurred before genome compaction in a lineage of extreme parasites. Proc Natl Acad Sci U S A.  2014:111:15480–15485. 10.1073/pnas.1410442111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Haag  KL, et al.  Microsporidia with vertical transmission were likely shaped by nonadaptive processes. Genome Biol Evol.  2020:12:3599–3614. 10.1093/gbe/evz270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Haas  BJ, et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol.  2008:9:R7. 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Heinz  S, et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell.  2010:38:576–589. 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Hu  J, et al.  NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol.  2024:25:107. 10.1186/s13059-024-03252-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Hu  J, Fan  J, Sun  Z, Liu  S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 2020:36:2253–2255. 10.1093/bioinformatics/btz891. [DOI] [PubMed] [Google Scholar]
  36. Huerta-Cepas  J, et al.  eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res.  2019:47:D309–D314. 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. James  TY, et al.  Shared signatures of parasitism and phylogenomics unite cryptomycota and microsporidia. Curr Biol.  2013:23:1548–1553. 10.1016/j.cub.2013.06.057. [DOI] [PubMed] [Google Scholar]
  38. Jespersen  N, Monrroy  L, Barandun  J. Impact of genome reduction in microsporidia. In: Weiss  LM, Reinke  AW, editors. Microsporidia: current advances in Biology. Springer International Publishing; 2022. p. 1–42. 10.1007/978-3-030-93306-7_1. [DOI] [PubMed] [Google Scholar]
  39. Jones  P, et al.  InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014:30:1236–1240. 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Käll  L, Krogh  A, Sonnhammer  ELL. A combined transmembrane topology and signal peptide prediction method. J Mol Biol.  2004:338:1027–1036. 10.1016/j.jmb.2004.03.016. [DOI] [PubMed] [Google Scholar]
  41. Katoh  K, Misawa  K, Kuma  K, Miyata  T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res.  2002:30:3059–3066. 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Katoh  K, Standley  DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol.  2013:30:772–780. 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Khalaf  A, et al.  2025 May 15. Forty new genomes shed light on sexual reproduction and the origin of tetraploidy in microsporidia [preprint]. bioRxiv 652816. 10.1101/2025.05.12.652816 [DOI]
  44. Khalaf  A, Francis  O, Blaxter  ML. Genome evolution in intracellular parasites: microsporidia and apicomplexa. J Eukaryot Microbiol.  2024:71:e13033. 10.1111/jeu.13033. [DOI] [PubMed] [Google Scholar]
  45. Klemm  P, Stadler  PF, Lechner  M. Proteinortho6: pseudo-reciprocal best alignment heuristic for graph-based detection of (co-)orthologs. Front Bioinform.  2023:3:1322477. 10.3389/fbinf.2023.1322477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kolmogorov  M, Yuan  J, Lin  Y, Pevzner  PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol.  2019:37:540–546. 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
  47. Korf  I. Gene finding in novel genomes. BMC Bioinformatics. 2004:5:59. 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Kriventseva  EV, et al.  OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res.  2019:47:D807–D811. 10.1093/nar/gky1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Lang  BF, et al.  Mitochondrial genome annotation with MFannot: a critical analysis of gene identification and gene model prediction. Front Plant Sci.  2023:14:1222186. 10.3389/fpls.2023.1222186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Li  D, et al.  MEGAHIT v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods. 2016:102:3–11. 10.1016/j.ymeth.2016.02.020. [DOI] [PubMed] [Google Scholar]
  51. Li  H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018:34:3094–3100. 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Majoros  WH, Pertea  M, Salzberg  SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004:20:2878–2879. 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]
  53. Mandujano-González  V, Villa-Tanaca  L, Anducho-Reyes  MA, Mercado-Flores  Y. Secreted fungal aspartic proteases: a review. Revista Iberoamericana de Micología. 2016:33:76–82. 10.1016/j.riam.2015.10.003. [DOI] [PubMed] [Google Scholar]
  54. Manni  M, Berkeley  MR, Seppey  M, Simão  FA, Zdobnov  EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol.  2021:38:4647–4654. 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Mascarenhas dos Santos  AC, Julian  AT, Liang  P, Juárez  O, Pombert  J-F. Telomere-to-telomere genome assemblies of human-infecting Encephalitozoon species. BMC Genomics. 2023:24:237. 10.1186/s12864-023-09331-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Minh  BQ, et al.  IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol.  2020:37:1530–1534. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Nakjang  S, et al.  Reduction and expansion in microsporidian genome evolution: new insights from comparative genomics. Genome Biol Evol.  2013:5:2285–2303. 10.1093/gbe/evt184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Nawrocki  EP, Eddy  SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013:29:2933–2935. 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Palmer  J, Stajich  J. 2022. nextgenusfs/funannotate: Funannotate (Version 1.8.14) [Computer software]. Zenodo. [Accessed 2022 Sept 01]. https://zenodo.org/record/2604804.
  60. Parisot  N, et al.  Microsporidian genomes harbor a diverse array of transposable elements that demonstrate an ancestry of horizontal exchange with metazoans. Genome Biol Evol.  2014:6:2289–2300. 10.1093/gbe/evu178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Pombert  J-F, Haag  KL, Beidas  S, Ebert  D, Keeling  PJ. The Ordospora colligata genome: evolution of extreme reduction in microsporidia and host-to-parasite horizontal gene transfer. mBio. 2015:6:e02400–e02414. 10.1128/mBio.02400-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Poulin  R, Randhawa  HS. Evolution of parasitism along convergent lines: from ecology to genomics. Parasitology. 2015:142:S6–S15. 10.1017/S0031182013001674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Quandt  CA, et al.  The genome of an intranuclear parasite, Paramicrosporidium saccamoebae, reveals alternative adaptations to obligate intracellular parasitism. Elife. 2017:6:e29594. 10.7554/eLife.29594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Quinlan  AR. BEDTools: the Swiss-army tool for genome feature analysis. Curr Protoc Bioinformatics.  2014:47:11.12.1–11.12.34. 10.1002/0471250953.bi1112s47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. R Core Team . 2024. R: The R Project for Statistical Computing [Computer software]. R Foundation for Statistical Computing. [Accessed 2024 Jun 15]. https://www.r-project.org/.
  66. Refardt  D, Decaestecker  E, Johnson  PTJ, Vávra  J. Morphology, molecular phylogeny, and ecology of Binucleata daphniae n. G., n. Sp. (Fungi: Microsporidia), a parasite of Daphnia magna Straus, 1820 (Crustacea: Branchiopoda). J Eukaryot Microbiol.  2008:55:393–408. 10.1111/j.1550-7408.2008.00341.x. [DOI] [PubMed] [Google Scholar]
  67. Refardt  D, Mouton  L. Reverse arrangement of rRNA subunits in the microsporidium Glugoides intestinalis. J Eukaryot Microbiol.  2007:54:83–85. 10.1111/j.1550-7408.2006.00149.x. [DOI] [PubMed] [Google Scholar]
  68. Roach  MJ, Schmidt  SA, Borneman  AR. Purge haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics. 2018:19:460. 10.1186/s12859-018-2485-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Schmid-Hempel  P. Evolutionary parasitology: the integrated study of infections, immunology, ecology, and genetics. 2nd ed.  Oxford University Press; 2021. [Google Scholar]
  70. Schultz  DT, et al.  Ancient gene linkages support ctenophores as sister to other animals. Nature. 2023:618:110–117. 10.1038/s41586-023-05936-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Seppey  M, Manni  M, Zdobnov  EM. BUSCO: assessing genome assembly and annotation completeness. In: Kollmar  M, editors. Gene prediction: methods and protocols. Springer; 2019. p. 227–245. 10.1007/978-1-4939-9173-0_14. [DOI] [PubMed] [Google Scholar]
  72. Smit  A, Hubley  R, Green  P. 2013. RepeatMasker Open-4.0 [Computer software]. [Accessed 2021 Mar 19]. http://www.repeatmasker.org.
  73. South  LR, Hurdeal  VG, Fast  NM. Genomics and phylogenetic relationships of microsporidia and their relatives. J Eukaryot Microbiol.  2024:71:e13051. 10.1111/jeu.13051. [DOI] [PubMed] [Google Scholar]
  74. Stanke  M, Diekhans  M, Baertsch  R, Haussler  D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008:24:637–644. 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
  75. Stanke  M, Schöffmann  O, Morgenstern  B, Waack  S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006:7:62. 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Svedberg  D, et al.  Functional annotation of a divergent genome using sequence and structure-based similarity. BMC Genomics. 2024:25:6. 10.1186/s12864-023-09924-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Vasimuddin  M, Misra  S, Li  H, Aluru  S. 2019. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. 2019 IEEE international parallel and distributed processing symposium (IPDPS). p. 314–324. 10.1109/IPDPS.2019.00041. [DOI]
  78. Vávra  J, Fiala  I, Krylová  P, Petrusek  A, Hyliš  M. Molecular and structural assessment of microsporidia infecting daphnids: the “obtusa-like” microsporidia, a branch of the monophyletic agglomeratidae clade, with the establishment of a new genus Conglomerata. J Invertebr Pathol.  2018:159:95–104. 10.1016/j.jip.2018.10.003. [DOI] [PubMed] [Google Scholar]
  79. Videvall  E. Plasmodium parasites of birds have the most AT-rich genes of eukaryotes. Microb Genom.  2018:4:e000150. 10.1099/mgen.0.000150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Wadi  L, Reinke  AW. Evolution of microsporidia: an extremely successful group of eukaryotic intracellular parasites. PLoS Pathog.  2020:16:e1008276. 10.1371/journal.ppat.1008276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Walker  BJ, et al.  Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014:9:e112963. 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Weiss  LM, Becnel  JJ. Microsporidia: pathogens of opportunity. Wiley; 2014. 10.1002/9781118395264. [DOI] [Google Scholar]
  83. Wickham  H, et al.  Welcome to the tidyverse. J Open Source Softw.  2019:4:1686. 10.21105/joss.01686. [DOI] [Google Scholar]
  84. Williams  BAP, Williams  TA, Trew  J. Comparative genomics of microsporidia. In: Weiss  LM, Reinke  AW, editors. Microsporidia: current advances in biology. Springer International Publishing; 2022. p. 43–69. 10.1007/978-3-030-93306-7_2. [DOI] [Google Scholar]
  85. Wiredu Boakye  D, et al.  Decay of the glycolytic pathway and adaptation to intranuclear parasitism within enterocytozoonidae microsporidia. Environ Microbiol.  2017:19:2077–2089. 10.1111/1462-2920.13734. [DOI] [PubMed] [Google Scholar]
  86. Xiong  X, et al.  New insights into the genome and transmission of the microsporidian pathogen Nosema muscidifuracis. Front Microbiol.  2023:14:1152586. 10.3389/fmicb.2023.1152586. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Ye  Z, Jiang  X, Pfrender  ME, Lynch  M. Genome-wide allele-specific expression in obligately asexual Daphnia pulex and the implications for the genetic basis of asexuality. Genome Biol Evol.  2021:13:evab243. 10.1093/gbe/evab243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Žárský  V, et al.  Contrasting outcomes of genome reduction in mikrocytids and microsporidians. BMC Biol.  2023:21:137. 10.1186/s12915-023-01635-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Zhang  C, Mirarab  S. Weighting by gene tree uncertainty improves accuracy of quartet-based species trees. Mol Biol Evol.  2022:39:msac215. 10.1093/molbev/msac215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Zimin  AV, et al.  Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res.  2017:27:787–792. 10.1101/gr.213405.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

evaf166_Supplementary_Data

Data Availability Statement

Raw data are deposited in the NCBI SRA database, the assembled genome plus the predicted protein sequences are available in the NCBI GenBank (BioProject IDs PRJNA1199946, PRJNA1199805, and PRJNA1206819), and the trimmed alignments are deposited at https://doi.org/10.6084/m9.figshare.29715890.


Articles from Genome Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES