Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2025 Oct 21;23(10):e3003446. doi: 10.1371/journal.pbio.3003446

Forty new genomes shed light on sexual reproduction and the origin of tetraploidy in Microsporidia

Amjad Khalaf 1,*, Chenxi Zhou 1,2, Claudia C Weber 1, Emmelien Vancaester 1, Ying Sims 1, Alex Makunin 1, Thomas C Mathers 1, Dominic E Absolon 1, Jonathan M D Wood 1, Shane A McCarthy 1, Kamil S Jaron 1, Mark Blaxter 1, Mara K N Lawniczak 1
Editor: Joseph Heitman3
PMCID: PMC12558613  PMID: 41118362

Abstract

Microsporidia are single-celled, obligately intracellular parasites with growing public health, agricultural, and economic importance. Despite this, Microsporidia remain relatively enigmatic, with many aspects of their biology and evolution unexplored. Key questions include whether Microsporidia undergo sexual reproduction, and the nature of the relationship between tetraploid and diploid lineages. While few high-quality microsporidian genomes currently exist to help answer such questions, large-scale biodiversity genomics initiatives, such as the Darwin Tree of Life project, can generate high-quality genome assemblies for microsporidian parasites when sequencing infected host species. Here, we present 40 new microsporidian genome assemblies from infected arthropod hosts that were sequenced to create reference genomes. Out of the 40, 32 are complete genomes, eight of which are chromosome-level, and eight are partial microsporidian genomes. We characterized 14 of these as polyploid and five as diploid. We found that tetraploid genome haplotypes are consistent with autopolyploidy, in that they coalesce more recently than species, and that they likely recombine. Within some genomes, we found large-scale rearrangements between the homeologous genomes. We also observed a high rate of rearrangement between genomes from different microsporidian groups, and a striking tolerance for segmental duplications. Analysis of chromatin conformation capture (Hi-C) data indicated that tetraploid genomes are likely organized into two diploid units, similar to dikaryotic cells in fungi, with evidence of recombination within and between units. Together, our results provide evidence for the existence of a sexual cycle in Microsporidia, and suggest a model for the microsporidian lifecycle that mirrors fungal reproduction.


Microsporidia are single-celled, intracellular parasites of growing public health, agricultural, and economic importance. This study presents 40 new microsporidian genomes derived collaterally from Darwin Tree of Life sequencing of their arthropod hosts, revealing that tetraploid genomes are organised into two diploid compartments/nuclei, with recombination between and within nuclei.

Introduction

Biology is characterized by intimate interactions and a fundamental interdependence between organisms living in close proximity. Yet, our understanding of symbionts and cobionts (organisms sampled alongside a target organism) typically lags behind our understanding of their hosts. Primarily, this is due to difficulties in accessing an organism’s symbionts, growing them, and/or assessing their behavior. One such example is Microsporidia, which are single-celled, spore-forming, obligately intracellular parasites [1]. They were first described as the agent of a disease known as “pébrine” in farmed silkworms (Bombyx mori, Lepidoptera), which had caused crises in the industry along the silk road in the late 1800s [2,3]. Since then, Microsporidia have been identified in many different hosts, and are now appreciated as parasites of a broad range of metazoans, and even some protozoans [4,5]. Nine genera have been identified as human pathogens, especially in immunocompromised individuals, causing a range of symptoms such as diarrhea, encephalitis, keratitis, sinusitis, and myositis [610]. Similarly, arthropod-infecting microsporidians constitute a growing problem for beekeeping and aquaculture industries worldwide [1114]. Other microsporidians are being explored as potential malaria transmission control agents, with evidence associating infection of Anopheles mosquitoes to a reduction in Plasmodium transmission [1517].

Despite their importance, microsporidian biology remains fairly enigmatic, owing in part to their obligately intracellular nature, small size, low prevalence in most host populations, and biological quirks such as cryptomitosis (where the condensation and separation of chromatin into distinct chromosomes are unclear) [4,18,19]. One oddity that some microsporidian species share with diplomonads such as Giardia intestinalis is the persistence of two equivalent nuclei inside one cell, closely appressed against each other with distinct nuclear membranes and synchronous replication, for the whole lifecycle—a form known as a “diplokaryon” [2023]. Some microsporidians remain monokaryotic for all their lifecycle [24], and others cycle between diplokaryotic and monokaryotic forms [21,2528].

The cycling between monokaryotic and diplokaryotic states in some microsporidian species has widely been assumed to be part of a meiotic reproductive cycle, and phenomena interpreted as gametogenesis, plasmogamy, karyogamy, and synaptonemal complexes have been reported [21,24,2940]. Whilst strictly diplokaryotic and strictly monokaryotic species were assumed to only undergo asexual mitosis [41], recent observations such as a very transient monokaryotic stage and synaptonemal complexes in diplokaryotic species suggest otherwise [42,43]. Population-level genetic data has also suggested the occurrence of recombination events in both monokaryotic and diplokaryotic species [4450]. Although the presence of meiosis has never been validated beyond morphological data, homologs of genes involved in meiosis have been identified in many microsporidian species, with phylogenetic analysis indicating that all Microsporidia may have descended from a sexual fungal ancestor [18].

Furthermore, the ploidy of each microsporidian nucleus, and whether that changes during developmental or meiotic cycles, remains an open question. After the first discovery of polyploidy in Microsporidia, tetraploidy has been suggested to manifest in the diplokaryotic form [51], with species proposed to cycle between haploid-diploid or diploid-tetraploid states as part of their reproductive cycles [52]. Some doubt has been cast on this now that seven microsporidian species have been reported to be tetraploid, including both monokaryotic and diplokaryotic species [51,53,54]. Additionally, some microsporidians have species-specific lifecycle variants, such as the formation of an octet of spores as one infective unit (an “octospore”) [29,55], and it is unknown whether the ploidy of these is the same as their monokaryotic/diplokaryotic forms.

Ploidies other than haploid and diploid are common in nature but generally phylogenetically unstable, and are usually resolved back to effective diploidy through loss of one set of chromosomes or genome-wide rediploidisation leaving the signatures of whole genome duplication [56]. The high frequency of apparent tetraploids in Microsporidia [53] could be a reflection of a tendency towards formation of tetraploids or a reflection of a fundamental tetraploid state. It is unclear whether observed microsporidian polyploids arose through autopolyploidy, where the four genome copies derive from within-species events, or allopolyploidy, where genomes from different species are combined through hybridization (reviewed in [57]). Furthermore, it is unknown whether tetraploidy in Microsporidia is characterized by a single ancient event that has been stably maintained across multiple lineages, an outcome that would be highly unusual given that rediploidisation is typically a key process that follows whole-genome duplications [5866], or whether polyploidy has arisen independently in each microsporidian lineage where it is observed.

While the number of microsporidian genome assemblies is increasing, few are high-quality, chromosome-level assemblies. High-quality assemblies are crucial to disentangling questions about the occurrence of sexual reproduction, the origins of polyploidy, and the interaction of the two [67]. Large-scale reference genome and next-generation sequencing initiatives, such as the Darwin Tree of Life (DToL) [68], can incidentally generate high-quality genome assemblies for cobionts when sequencing host species, and thus offer an unrivaled data-generation opportunity for rare and unculturable endosymbionts. For instance, DToL has recently released over 100 novel Wolbachia genomes and two cnidarian endoparasite genomes assembled from data arising from individual hosts targeted for reference genomes [69,70]. Some screens for Microsporidia have also been carried out on DToL data, yielding a few microsporidian genome sequences [67,71,72].

Here, we present microsporidian genomic data from 40 host organisms sequenced at the Wellcome Sanger Institute as part of DToL. We recover eight partial and 32 complete (or nearly complete, with Benchmarking Using Single Copy Orthologues [BUSCO] completeness scores >70%) microsporidian genomes. Eight of our complete genomes are chromosome-level assemblies, seven of which, to our knowledge, were scaffolded with the first Hi-C data generated for Microsporidia. These new genomes represent much of the breadth of currently described microsporidian diversity, with genomes from five of the seven microsporidian clades named by Bojko and colleagues [4]. We show that one tetraploid genome is organized into two units, likely the nuclei of the diplokaryon, but also show evidence of historical recombination between all four genomes. We describe rearrangements between the haplotypes of some genomes, chromosomal rearrangements in microsporidian evolution, and a high tolerance for segmental duplications. We recognize recombination signatures in other tetraploid genomes, and propose a model to synthesize our observations of ploidy, reproduction, and the microsporidian lifecycle.

Results

Forty new microsporidian genome assemblies

We identified host organisms carrying microsporidian infections by screening the raw genome sequencing data generated by DToL for microsporidian sequences, or by PCR amplification of microsporidian targets from host DNA extracts before genome sequencing. We also observed microsporidian-like sequences in additional genomes that proved to be horizontal DNA transfers into the host genome in the absence of current live infections. From 40 host species, we recovered 32 complete microsporidian genome assemblies with BUSCO completeness score >70%, including eight chromosome-level genomes, seven of which were scaffolded with Hi-C data. We also assembled eight partial genome sequences, with BUSCO completeness score <70%. The recovered microsporidian genomes come from hosts belonging to eight insect orders, with lepidopteran and dipteran hosts yielding 13 and 12 microsporidian genomes, respectively (Fig 1). The hosts include several not previously known to be infected by microsporidia, including Loensia variegata (Psocodea, bark lice), Vulgichneumon bimaculatus (Hymenoptera, ichneumon wasp), and Delia platura (Diptera, seedcorn maggot fly) (S1 Table). Our sample size is small, but we note that most microsporidian genomes from lepidopteran hosts are derived from the microsporidian group Nosematida (77%, n = 10), and most microsporidian genomes from dipteran hosts are derived from Amblyosporida (67%, n = 8). While some microsporidians are known to distort the sex of their host populations [74], we found no evidence of such in our limited sampling (S1 Fig). The full list of hosts, their recovered microsporidian genome assemblies, and associated genome summary statistics are given in S1 Table. See Materials and methods for steps taken to confirm that the microsporidian genomes come from single species and not mixed infections.

Fig 1. Prevalence of Microsporidia in DToL insect genomes.

Fig 1

Microsporidian genomes recovered from insect hosts, split by taxonomic order. F: female, M: male, U: unspecified sex. The silhouettes used in this figure were taken from https://www.phylopic.org, and are all under CC0 1.0 Universal Public Domain Dedication. Credits: Ephemeroptera, Nathan Jay Baker; Psocodea, Christina N. Hodson; Hemiptera, Dave Angelini; Hymenoptera, Emma Kärrnäs; Coleoptera, Kanako Bessho-Uehara; Diptera, Christina N. Hodson; Trichoptera, Christoph Schomburg; and Lepidoptera, Andy Wilson. The data underlying this figure can be found in S1 Table. The figure was generated using ToyTree [73], and manually annotated using InkScape (version 1.2.2).

The genome sequences were placed in five of the seven microsporidian clades identified by Bojko and colleagues [4]. Half derived from Nosematida (two chromosome-level genomes, 13 complete genomes, five partial genomes) and 11 from Amblyosporida (six chromosome-level genomes, five complete genomes) (Figs 1 and 2). Four belonged to Enterocytozoonida (one chromosome-level genome, two complete genomes, one partial genome), two to Neopereziida (two partial genomes), and three to Glugeida (three complete genomes) (Figs 1 and 2). Most chromosome-level and complete genome assemblies presented here have comparable or higher contiguity and BUSCO completeness compared to previously published microsporidian genome assemblies (Fig 2). Our complete genome sequences range in span from 2.35 Mb to 56 Mb. The eight largest genome assemblies are unpurged due to their read depth being too low to run purge_dups [80], or because of the presence of severe between-haplotype rearrangements, or due to the lack of Hi-C data; and thus retain haplotypic duplication (which also results in the high BUSCO duplication scores observed in Fig 2). The largest purged assembly (idChiSpeb1.µ, see Materials and methods for an explanation of the naming system used) is 20 Mb (Fig 2). This falls within the range of previously sequenced microsporidian genomes, which range in size from 2.2 Mb (Encephalitozoon romalae) to 51.3 Mb (Edhazardia aedis) [81].

Fig 2. 600 gene phylogeny of Microsporidia.

Fig 2

(A) ASTRAL [75] phylogeny summarizing individual phylogenies of 600 BUSCO genes (microsporidia_odb10) [76] across all publicly available microsporidian genome assemblies (excluding multiple strains where they are available), and the genome assemblies generated in this study (n = 40, marked in purple). The full phylogeny with all publicly available genomes, including different strains, is found in S3 Fig. Branch lengths were estimated with IQ-TREE using a concatenated alignment of the individual BUSCOs [77]. Nodes with less than 95% support are marked with pink circles. Ploidy is marked in circles at the tips of the tree for genomes where it was characterizable. (B) Genome assembly span (Mb) as calculated by assembly-stats (Github: https://github.com/sanger-pathogens/assembly-stats), with black circles marking chromosome-level genome assemblies. (C) N50 values (Mb) as calculated by assembly-stats (Github: https://github.com/sanger-pathogens/assembly-stats), with asterisks marking purged genome assemblies. (D) BUSCO gene (microsporidia_odb10) completeness percentage, marked in green for single-copy genes, and beige for duplicated genes. (E) Transposable element percentage as predicted by RepeatModeler and RepeatMasker [78,79], marked in burgundy for retroelements, peach for DNA transposons, and blue for rolling circles. Neop.: Neopereziida; Or. Lin.: Orphan Lineage. The data underlying (A) can be found in S1 Text. The data underlying (B), (C), (D), and (E) can be found in S1 Table. The figure was generated using ToyTree [73], and manually annotated using InkScape (version 1.2.2).

Ploidy was estimated using GenomeScope2 and Smudgeplot [82], following criteria we outlined previously [53]. In brief, we estimated ploidy for genomes with average read depth exceeding 20× (see Materials and methods for full explanation). As a result, we were able to characterize 14 of our genomes as polyploid, and five as diploid (Fig 2). We note that some microsporidian genomes had large segments differing in copy number from the majority of the genome. These occurred both within and between chromosomes. The duplications varied in length, and preferentially occurred towards the ends of contigs or chromosomes (see S2 Fig for details on how these were identified and examples). Such cases are common, with some level of segmental duplication observed in nearly all of the 14 polyploid genomes (see File Collection 1 at https://doi.org/10.5281/zenodo.17251512). No other phylogenetic or host metadata unites those genomes.

We also performed repeat annotation (with RepeatModeler and RepeatMasker) on all genomes [78,79] so that all genomes were annotated congruently. To assess the relationship between the phylogeny and transposable element loads, and genome spans, we computed transformations representing the fit of each feature with the tree’s topology (λ), branch-lengths (κ), and root–tip distance (δ) [83] (S3 Table). For the majority of examined traits, we found no significant phylogenetic signal, with the exception of a strong root–tip distance effect on all transposable element loads. Retroelement load and DNA transposon loads were moderately correlated to one another and genome span, whereas helitron load was weakly correlated to all other features (S4 Table). We also explored the distribution of transposable elements and rRNA sequences along chromosome-level genomes (File Collections 2 and 3 at https://doi.org/10.5281/zenodo.17251512). Whilst in Encephalitozoons transposable elements and rRNAs clustered in subtelomeric regions [84], the majority of chromosomes from other genomes did not exhibit a similar pattern.

The phylogenetic tree suggests a revision of microsporidian relationships. We confidently placed Nosematida as a sister group to Enterocytozoonida, and the group known as ‘orphan lineage’ (containing the genera Hamiltosporidium and Astathelohania) as a sister group to Amblyosporida, in agreement with previous whole-genome phylogenies [4,85,86]. However, with moderate confidence (node with 87% support, see newick string in S1 Text), we place Glugeida a sister group to the ancestor of the orphan lineage and Amblyosporida. The previously contested place of Neopereziida as a sister group to the ancestor of Nosematida and Enterocytozoonida was confidently confirmed (Fig 2).

The classification of species traditionally assigned to Nosema, such as Nosema (=Vairimorpha) ceranae and Nosema (=Vairimorpha) apis, has been the subject of ongoing debate [4,54,8789]. Our phylogeny, using 600 loci, strongly supports the split between Vairimorpha and Nosema, with Nosema (=Vairimorpha) ceranae clustering robustly with other Vairimorpha genomes (Fig 2).

Operational taxonomic unit (OTU) classification of species suggests autotetraploidy in Microsporidia

Morphological and histopathological data are usually employed to identify microsporidia to species level, but no such data were available for these newly assembled microsporidian genomes. We measured phylogenetic branch lengths between every possible combination of two genomes to establish a proxy baseline for genomic disparity among individuals that are known to be members of the same species based on morphological, histopathological, or cell culture data. We then used this baseline to assess whether any of the new genomes could be diagnosed as members of known species, and whether the homeologous subgenomes of our tetraploid genomes could be characterized as belonging to the same species (autotetraploidy) or not (allotetraploidy). We note that morphological and histopathological data remain important for species delineation, and such approaches may not be applicable to organisms where the traditional species concept may not apply.

Publicly available genomes classified to the same species exhibited high genomic similarity, with an average branch length distance of 0.01 amino acid substitutions per site (using the BUSCO-based phylogeny, see S3 Fig). On the other hand, genomes from different species were much more divergent, with an average branch length of 0.577 amino acid substitutions per site. One pair of species, Hamiltosporidium tvaerminnensis and Hamiltosporidium magnivora, had particularly short branch lengths (0.012–0.017 amino acid substitutions per site). Excluding these, the smallest branch length between different species within the same genus was 0.15 amino acid substitutions per site.

Relying on these data, we established a conservative branch length threshold to classify genomes as belonging to the same species based on the smallest distance between H. tvaerminnensis and H. magnivora genomes (0.012 amino acid substitutions per site). Using this criterion, we classified 17 of the new genomes as belonging to a known species or an unnamed species formed by two or more of our genomes (S6 Table). If within-species divergence was permitted to extend to the full range observed within species (i.e., to 0.032 substitutions per site), one additional new genome was classified as being a likely member of a named species. From these analyses, we identified five species-like groupings of otherwise unidentified microsporidia sequenced here (S6 Table). Those include gmOTU1 comprising iuLoeVari1.µ (from host Loensia variegata [Psocodea]), idNerComm1.µ (from host Neria commutata [Diptera]), and idMegDoli1.µ (from host Megamerina dolium [Diptera]); and gmOTU2 from five dipteran hosts: idDelPlat3.µ (from host Delia platura [Diptera]), idTanUsma1.µ (from host Tanytarsus usmaensis), idDelPlat4.µ (from host Delia platura), idLucSpea1.µ (from host Lucilla sp.), and idDelPlat5.µ (from host Delia platura). For an explanation of the naming system used for OTUs, see Materials and methods.

In this context, we also explored within-individual BUSCO divergences between homeologous subgenomes within tetraploid assemblies. We found that in all but one case at least 85% of homeologous gene pairs were separated by distances smaller than the relaxed threshold (0.032 substitutions per site) used for species delineation (Fig 3), consistent with autotetraploidy. The 15% of more divergent homeologous gene pairs were scattered across contigs in the assembly, suggesting they are isolated cases of increased divergence (Oxford dot plots in File Collection 4 at https://doi.org/10.5281/zenodo.17251512). In the remaining case, ilAceEphe1.µ (from host Acentria ephemerella [Lepidoptera]), over 40% of homeologous gene pairs exceeded the same-species divergence threshold (Fig 3A) and many of these divergent homeologues were segregated on separate contigs (Fig 3B). We note that while we are unable to distinguish between gene copies arising from polyploidisation events versus other segmental duplication events for any genome, this high proportion of divergent homeologues in ilAceEphe1.µ may indicate it is the product of a recent hybridization event between two related diploid individuals (i.e., an allotetraploid). Furthermore, this pattern in ilAceEphe1.µ is unlikely to be the result of a mixed infection of two diploid individuals as the read depths of the four subgenomes are congruent with them belonging to one single genome (File Collection 1 at https://doi.org/10.5281/zenodo.17251512).

Fig 3. Pairwise phylogenetic branch lengths between homeologous gene pairs in tetraploid genomes.

Fig 3

(A) Histograms showing phylogenetic branch lengths (in amino acid substitutions per site) between homeologous gene pairs for tetraploid genomes. The relaxed branch length threshold for species delineation is highlighted in a dashed red line (0.032 amino acid substitutions per site). The percentage of gene pairs that exceed this same-species threshold is given in a box in the top right of each plot. (B) Oxford dot plot of tetraploid ilAceEphe1.µ (from host Acentria ephemerella [Lepidoptera]) using BUSCO genes. Contig boundaries are marked by gray lines. Gene pairs that are less divergent than the same species threshold are in sky blue, while gene pairs that are more divergent than the same species threshold are in red. The data underlying (A) was generated by running BUSCO (microsporidia_odb10, version 5.4.6) [76] on the unpurged genome assemblies of the tetraploid genomes. For each tetraploid, the haplotypes of each BUSCO locus were aligned to one another and an outgroup using MAFFT (version 7.525) [90], and a phylogeny was generated for each alignment using IQ-TREE (version 2.3.4, with ModelFinder enabled) [77, 91]. Subsequently, the branch lengths between homeologous gene pairs were extracted from each phylogeny, and plotted in the histograms seen in (A) using a custom script in S1 Script. The individual BUSCO phylogenies used to derive this data can also be found in File Collection 15 at https://doi.org/10.5281/zenodo.17251512. The BUSCO gene annotations used to generate (B) can be found in File Collection 5 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

Homeologous genomes coalesce more recently than do species in most tetraploid Microsporidia

With the discovery of tetraploidy in many microsporidian lineages [51, 53, 54], one of the key questions is whether tetraploidy is an ancient event shared between multiple (or all) tetraploid microsporidian lineages and subsequently lost by diploid lineages nested within ancestrally tetraploid clades, or is phylogenetically recent and the result of multiple, independent lineage-specific events. The generation of resolved tetraploid genome assemblies allows us to address this question. In the absence of recombination, gene conversion, and sexual reproduction, under ancient tetraploidy we should find that the two homeologous genomes within a tetraploid species have a coalescence deeper than that of the homologous genomes compared between related species [93]. However, if recombination among homeologous genomes occurs, the homeologous genomes within an individual may on average coalesce more recently than they do between species, even while some loci retain a signal of the deep coalescence of the lineages that contributed to the tetraploid. On the other hand, if tetraploidy is the product of recent, independent events, haplotypes will coalesce more recently than species, even in the absence of recombination, and there will be no deep coalescence signal.

We performed the Approximately Unbiased statistical test [94] on multi-copy BUSCO gene phylogenies for all pairwise combinations of tetraploid microsporidian genomes. We found that the haplotypes in each tetraploid were more similar to each other than they were to genomes from other species (i.e., haplotypes coalesce more recently than species) (Fig 4). In three species where we had multiple high-contiguity assemblies, the homeologous genomes had coalescences deeper than the within-species coalescence of homologs, implying a single origin of tetraploidy at the base of the species. Thus, the three Vairimorpha cerenae genomes, the two genomes assessed in gmOTU1 and the four genomes assessed in gmOTU2 had tetraploid origin coalescences deeper than the within-species coalescence (Fig 4). Interestingly, the three Anncaliia algerae genomes appear more distinct, suggesting reduced flow between the sampled individuals (Fig 4). As noted above, ilAceEphe1.µ may have arisen from a hybridization event (Fig 3). However, its diploid ancestors were likely more closely related to each other than they were to other tetraploid genomes sampled in our study, as we observed nearly all coalesces between the homeologues within the tetraploid occurring more recently than they do with other genomes (Fig 4).

Fig 4. Proportion of multi-copy genes which coalesce prior to genomes.

Fig 4

Heatmap showing the fraction of genes that support a more recent homeologue coalescence than between-species coalescence. Fractions greater than 50% are indicated in green, whereas fractions lower than 50% are indicated in purple. The phylogeny is an ASTRAL [75] phylogeny summarizing individual phylogenies of 600 BUSCO genes (microsporidia_odb10) [76] from all publicly available tetraploid assemblies and the tetraploid assemblies generated in this study. The branch lengths were estimated using a concatenated alignment of the individual BUSCOs used, with IQ-TREE [77]. The phylogeny is congruent with the phylogeny in Fig 2. The data underlying this figure can be found in S2 Text. The figure was generated using Matplotlib [92] and ToyTree [73], and manually annotated using InkScape (version 1.2.2).

Because all tetraploid microsporidian genome homeologous subgenomes coalesce more recently than species, we cannot distinguish between ancient tetraploidy in a system where recombination, gene conversion, and/or sexual reproduction homogenize the two subgenomes; and recent tetraploidy in a system that may or may not undergo recombination and/or sexual reproduction.

Microsporidian genomes may carry signals of recent recombination

While our coalescence analyses above (Fig 4) and previous studies suggest that Microsporidia may undergo sexual reproduction [21,24,2940,4245], unequivocal signals of recombination have yet to be observed genomically. Phased chromosome-level genomes with known ploidies can help address this by identifying runs of homozygosity between otherwise differentiated homologous chromosomes that may have arisen by recent sexual or non-sexual (i.e., gene conversion) recombination.

For a purged tetraploid genome where all four copies of the genome were reconstructed (iuLoeVari1.µ from host Loensia variegata [Psocodea]), we assessed the nucleotide identity patterns between the homologous copies of the largest chromosome. It is striking that this analysis did not identify two pairs of diverged homeologues, but instead suggested a mosaic pattern of pairwise similarity (Fig 5). In line with this, we found nearly 20% of the tetraploid genome collapsed in the genome assembly (estimated haploid size is ~17 Mb, whereas tetraploid assembly span is 54.7 Mb). The collapsed regions identified are at the ends of chromosomes, where recombination rates are higher in many organisms [100,101]. Similar patterns were also observed in other high-contiguity genomes (File Collections 6 and 7 at https://doi.org/10.5281/zenodo.17251512). These observations are consistent with signatures of recent recombination.

Fig 5. Nucleotide identity (%) between tetraploid iuLoeVari1. µ haplotypes.

Fig 5

Each haplotype was compared to the other three haplotypes using minimap2 [95], and nucleotide identity (%) between them was plotted for each reference. Each haplotype has a mosaic pattern of identity to the others. The gray shaded area represents a “missing” segment of chromosome 1D, which we suggest is identical to and thus coassembled as the corresponding portion of chromosome 1C, which has double the expected coverage. The top panel of each plot shows mapped read coverage, and the middle panel displays GC content along the chromosome, with average GC content marked by a dashed red line. The coverage data underlying this figure was generated by mapping the PacBio reads against the genome using minimap2, and extracting read depth data using samtools and bedtools [96,97]. The GC data was generated by running seqkit fx2tab [98] on the genome. The genome can be found in File Collection 12 at https://doi.org/10.5281/zenodo.17251512. The genome’s BioSpecimenID can be found in S1 Table, and can be used to retrieve the associated PacBio reads from NCBI [99]. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

The tetraploid microsporidian cell contains two diploid units

Given the diversity of nuclear conditions in Microsporidia, we sought to understand 3D genome architecture in tetraploids, to determine whether homeologous subgenomes reside in distinct units (such as the nuclei of diplokaryons). Thus, we leveraged the availability of high coverage Hi-C data for one of our tetraploid microsporidian genomes, iuLoeVari1.µ from host Loensia variegata [Psocodea], to assess physical interactions (chromatin proximities) between the four subgenomes inside the microsporidian cell.

The Hi-C data indicated that for each chromosome, the four homeologous copies have a “two plus two” association (Fig 6). In addition, looking between chromosomes, these pairs are in turn more likely to interact with other pairs, and the whole genome can be partitioned into two diploid units, each containing 20 chromosomes (Fig 6). Furthermore, the signals consistent with recombination identified above (Fig 5) occur between the homeologous copies of chromosome 1 within and between these units. Taken together, our results suggest that the microsporidian tetraploid genome occurs in two recombining diploid units.

Fig 6. Hi-C heatmap for the tetraploid genome of iuLoeVari1.µ.

Fig 6

Hi-C contact maps are heatmaps that visualize the frequency of physical contacts between genomic regions in 3D-space. Regions that are closer together physically tend to show more interactions, appearing as darker colors on the map. The strongest signal, in dark red here, is always found along the diagonal, which represents self–self interactions (i.e., each genomic region interacting with itself and nearby regions along the same chromosome). Off-diagonal signals represent interactions between different chromosomes. (A) Hi-C contact map of the tetraploid iuLoeVari1.µ genome (host Loensia variegata [Psocodea]). Each chromosome, with its four copies, is highlighted by a yellow box. (B) Hi-C contact map showing the interactions amongst the four copies of chromosome 1 and the four copies of chromosome 2. Green lines highlight interactions belonging to unit 1, and purple lines highlight interactions belonging to unit 2. Dotted lines indicate interactions between chromosomes 1 and 2. (C) Summary metrics for the genome assemblies of units A/B and C/D. The data underlying this figure was generated by mapping the Hi-C reads to the genome using the sanger-tol/curationpretext pipeline [102] (excluding multi-mapping reads). The genome can be found in File Collection 12 at https://doi.org/10.5281/zenodo.17251512. The genome’s BioSpecimenID can be found in S1 Table, and can be used to retrieve the associated Hi-C reads from NCBI [99]. The figure was generated using PretextView [103], and manually annotated using InkScape (version 1.2.2).

No evidence of recent rediploidisation in Microsporidia

Given we do not see deep coalescence of homeologues in tetraploid microsporidians, and cannot distinguish ancient from recent tetraploidisation events in the group, we sought to test whether diploid lineages showed any signal of an ancient tetraploid state. If tetraploidy was ancestral to Microsporidia or to major lineages within the group, the diploid lineages derived from tetraploid ancestry should show evidence of rediploidisation. Rediploidisation could be achieved by reestablishment of a diploid karyotype by selective loss of one set of chromosomes or through meiotic reduction division without subsequent fertilization. Alternatively, diploidy could be restored piecemeal by loss or subfunctionalisation of the homeologous copies of each gene [104108]. Piecemeal rediploidisation should leave a signal in the age of retained homeologous gene pairs, reflecting the divergence between the parental genomes, which would appear as paralogues in the diploidised genome.

We explored the age distribution of likely paralogous BUSCO gene pairs (homologous genes originating from a gene duplication event, called using wgd [104]) on the diploid microsporidian genomes. Relative age was estimated as the synonymous divergence (Ks) between the gene pairs. Piecemeal rediploidisation should be evident as a peak of gene duplicates of the same age [104108]. Restoration of diploidy through reductive division would not be expected to leave behind such a signal, as the remaining paralogous gene pairs would have had independent origins. Similarly, the absence of an ancient tetraploid state in diploid lineages would not be expected to leave behind such a signal. We found no peaks of shared divergence in the diploid genomes (Fig 7). Thus, this suggests that tetraploidy in Microsporidia may result either from recent, independent polyploidisation events; or from an ancestral tetraploid state followed by rediploidisation in diploid lineages through a process similar to that of reductive division in gametogenesis or by inheriting a single nucleus of a diplokaryon.

Fig 7. Age distributions of duplicate gene pairs.

Fig 7

Histograms showing synonymous divergence (Ks) distributions for candidate paralogous BUSCO gene pairs from representative diploid genomes. No evidence of recent rediploidisation events is seen, as there are no peaks against a background exponentially-decaying distribution coming from small-scale gene duplication events. The y-axis is highly variable due to different BUSCO gene family expansions occurring in different lineages, yielding larger counts of possible paralogous gene pairs. wgd was used to identify paralogous genes in every genome and compute Ks values [104]. The data underlying this figure can be found in File Collection 16 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

Evidence for rearrangements between homeologues and between genomes in Microsporidia

In addition to segmental duplications, we observe that some tetraploid microsporidian genomes carry signals of between-homeologue rearrangements. Those include inversions, fusions, fissions, and translocations that generated uneven haploid genomes that differed in gene content. To quantify how pronounced this phenomenon is in our genomes, we used a greedy algorithm to bin each tetraploid genome into four subgenome bins. Briefly, the algorithm iterates through contigs from largest to smallest, and appends a contig to a haplotype if the duplication in that contig does not exceed a specified threshold (see Github: https://github.com/Amjad-Khalaf/gerbil for implementation).

idChiSpeb1.µ (host Chironomus sp. [Diptera]) is a perfect tetraploid, with no assembly collapse, and thus can be binned into four equal subgenomes for the vast majority of duplication thresholds tested (Fig 8). Similarly, we were able to partition the genomes of iuLoeVari1.µ (host Loensia variegata [Psocodea])) and idNerComm1.µ (host Neria commutata [Diptera)] recovering two incomplete subgenomes showing some genome collapse (Fig 8). On the other hand, for three other genomes including ilAceEphe1.µ (host Acentria ephemerella [Lepidoptera]), ilMytImpu1.µ (host Mythimna impura [Lepidoptera]), and ihCicViri2.µ (host Cicadella viridis [Hemiptera]) we were unable to recover a haplotype with completeness similar to that of the unbinned genome assembly without also retaining a higher amount of duplication than expected (Fig 8 and S6 Fig). In line with this, the self-alignment dot plots of those genomes show numerous rearrangements and translocations (File Collection 6 at https://doi.org/10.5281/zenodo.17251512). We also found evidence of similar rearrangements and unevenness in other diploid and tetraploid purged microsporidian genomes, and in some genomes without known ploidies (Self alignment plots in File Collection 6 at https://doi.org/10.5281/zenodo.17251512). Together, these results provide evidence for extensive between-haploid subgenome rearrangement in microsporidian genomes, resulting in uneven gene distribution between haploid subgenomes. We examined whether between-haplotype rearrangements were associated with any genome metadata, and found no phylogenetic, host, or ecological association linking them.

Fig 8. Binning tetraploid genomes into four subgenomes using BUSCO genes.

Fig 8

(A) Using a greedy algorithm, we iterated through contigs from largest to smallest, appending a contig to a haplotypic subgenome if the duplication contributed by that contig does not exceed a specified threshold (x axes in the figure). Single-copy BUSCO gene completeness is marked by circles and multi-copy BUSCO gene completeness is marked by crosses. A red dashed line denotes the BUSCO completeness score of the unbinned assembly. For (B) idChiSpeb1.µ and (C) ilAceEphe1.µ, we plotted the largest 10 contigs in subgenome 1 with their BUSCO genes, and coloured these genes in the other subgenomes by their positions in subgenome 1. The BUSCO annotations underlying this figure can be found in File Collection 5 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using gerbil (https://github.com/Amjad-Khalaf/gerbil), and manually annotated using InkScape (version 1.2.2).

Given the above observations of synteny breakage between subgenomes within tetraploids, we expect to observe fractured synteny between species. Thus, we mapped the relative position of conserved orthologous BUSCO genes in all chromosome-level genome assemblies. Comparing closely-related genomes showed a pattern of dynamic change of microsporidian linkage groups, and a high rearrangement rate throughout Microsporidia (Fig 9). Confident reconstruction of ancestral linkage groups for all Microsporidia was not possible, likely because the rearrangement rate was indeed too high (see S3 Text and S7S12 Figs for details on rearrangements inferred and methods attempted).

Fig 9. Synteny plots of chromosomal microsporidian genome assemblies.

Fig 9

Genome-wide synteny plots of all chromosomal microsporidian genome assemblies for (A) Enterocytozoonida, Nosematida, and Neopereziida; and (B) Amblyosporida and the Orphan lineage. Each line represents a single-copy BUSCO (microsporidia_odb10) [76]. In (A) BUSCOs are painted by their chromosomal position in A. locustae, while in (B) they are painted by their chromosomal position in H. tvaerminnensis. The attached phylogeny is an ASTRAL phylogeny summarizing individual phylogenies of 600 BUSCO genes (microsporidia_odb10) [76]. The branch lengths were subsequently estimated using a concatenated alignment of the individual BUSCOs used, with IQ-TREE [77]. The BUSCO annotations underlying this figure can be found in File Collection 5 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using the ribbon plotting script in https://github.com/conchoecia/odp [109] and ToyTree [73], and manually annotated using InkScape (version 1.2.2).

Discussion

Forty new genome assemblies, a revised phylogeny, and the future of species delineation in Microsporidia

In this study, we present 32 new complete microsporidian genome sequences (including eight chromosome-level genomes, seven of which, to our knowledge, were scaffolded with the first implementation of Hi-C data on Microsporidia), and eight additional partial genome sequences (with BUSCO completeness score <70%). While not being the target organisms in the sequencing efforts used to generate these genome assemblies, most of the assemblies presented in this study have comparable or higher contiguity and BUSCO completeness than published microsporidian genomes (Fig 2). Our genome assembly process provides an accessible and successful approach for the assembly of microsporidian genomes from their host sequencing data, and can be translated to other cobionts where a wealth of sequenced host genomes is available.

The presented microsporidian genomes derive from five of the seven major microsporidian clades as defined by Bojko and colleagues [4]. They allowed us to revise microsporidian phylogeny, notably confirming Neopereziida as a sister group to the ancestor of Nosematida and Enterocytozoonida (Fig 2).

We show that genomic data support previous species delineations based on morphology, histopathology, and cell culture, and suggest divergence thresholds that differentiate species. While morphological and histopathological data remain important for identifying Microsporidia to species, it is likely that serendipitous discovery through large-scale genomic sequencing will be a major source of new microsporidian isolate genomes in the future. In prokaryotes, genomic metrics for species delineation, such as Average Nucleotide Identity (ANI) [110], are well accepted. ANI has also been used previously on Microsporidia [111], and the application of similar metrics to taxon delimitation in eukaryotes is in its infancy [112].

Tetraploidy in Microsporidia

From theoretical and observational grounds, the long-term maintenance of tetraploidy is unlikely. Usually, a new tetraploid lineage will revert to effective diploidy, either by losing one set of chromosomes or by reestablishment of exclusive pairing between homologs and doubling of the diploid chromosome number. The reestablishment of diploidy (rediploidisation) leaves a signature of whole genome duplication (WGD) in descendants, involving loss of copies of many genes and subfunctionalisation of retained genes [113]. WGD has been common during evolution of plants, animals, and fungi, and has been proposed to be a significant driver of organismal complexity and adaptive evolution [56]. The situation in Microsporidia differs from classical yeast, flowering plant, and animal models of tetraploidy and rediploidisation, as the constituent genomes appear to be largely maintained intact and show signatures of ongoing recombination between all four genomes.

In some microsporidian genomes, we observed nearly perfect synteny between haplotypes. However, we observed a striking level of between-haplotype rearrangement in three microsporidian genomes, two autotetraploids (ilMytImpu1.µ and ihCicViri2.µ) (Fig 8), and a genome with circumstantial evidence of being the result of a recent hybridization event between two closely related diploid genomes (Figs 3 and 8). We also see similar signatures, albeit on smaller scales, in other genomes (Self alignment plots in File Collection 6 at https://doi.org/10.5281/zenodo.17251512). In other taxa, such rearrangement is often associated with hybrid origin in polyploids, for example, in synthetic plant tetraploids such as Brassica allotetraploids [114118] and autotetraploid Arabidopsis thaliana [119], and in the multiple origins of tetraploidy in lager brewing yeast Saccharomyces pastorianus [120122]. Such rearrangement may be a common genomic response to the “genome shock” of the origin of tetraploidy [123,124].

We cannot currently distinguish a general genomic shock hypothesis for the origin of these rearranged microsporidian genomes from other possibilities, such as independent genetic disruption of reproductive processes, as seen in a number of yeast isolates with similar karyotype patterns [125]. Population-level whole-genome sequencing of selected species, over time (both in nature and in lab cell cultures), is required to better define this phenomenon and identify its root causes.

A diploid/tetraploid mating system as a potential reproductive model for Microsporidia

Understanding the origin of tetraploidy and its biology in Microsporidia has significant implications. If tetraploidy is ancient, Microsporidia would represent a rare case of a stable, species-rich tetraploid lineage, offering a unique opportunity to study genome evolution under these conditions. On the other hand, if tetraploidy is recent, Microsporidia would represent a clade with an unusually high propensity for polyploidisation. Additionally, if ploidy influences host specificity, this could inform strategies to mitigate the impact of Microsporidia on aquaculture and beekeeping or their potential as biological control agents, such as for malaria. Furthermore, a deeper understanding of their reproductive behavior could also aid in both managing and leveraging these organisms.

We identified many tetraploid microsporidian genomes, the majority of which are likely autotetraploids (Fig 3), but no concrete evidence that tetraploidy was the ancestral state in Microsporidia. We observed that homeologous subgenomes coalesce more recently than do the genomes of different species in all cases (Fig 4). However, this does not allow us to distinguish between ancient shared tetraploidy versus recent independent tetraploidy, as both models are likely to be indistinguishable in the presence of recombination within and between homeologous subgenomes during sexual reproduction. Similarly, in diploid lineages nested within tetraploids, we did not observe signals congruent with classic piecemeal rediploidisation (Fig 7). This suggests that tetraploidy in Microsporidia is independently acquired in the polyploid clades, or that diploid lineages have restored diploidy through one-step mechanisms that eliminate one diploid set of chromosomes. This could arise if a lineage was founded from one diploid unit (Fig 6), or through meiotic reduction division without subsequent fertilization.

If tetraploidy is the result of recent, independent events, a minimum of 15 events is predicted. We thus suggest that it is more likely that a propensity for tetraploidy is ancient in Microsporidia, and that the diploid lineages nested within tetraploids represent isolates or lineages that have undergone reductive division.

One of our key findings is that tetraploid microsporidian genomes are likely organized into two diploid units (Fig 6), with evidence of recombination both within and between units (Fig 5). Taking all our findings together, we propose that the units are the two nuclei of a diplokaryon (i.e., each nucleus is diploid), that Microsporidia undergo occasional sexual reproduction in a process that mirrors fungal reproduction (Fig 10), and that chromosomes independently reassort into the units in different individuals. Our interpretations of the data are in line with previous life cycle proposals of the diplokaryon being tetraploid, and a diploid/tetraploid cycling underpinning microsporidian life [51,52]. Furthermore, there is extensive morphological evidence of plasmogamy (cell fusion) and karyogamy (nuclear fusion) in Microsporidia [21,24,2940,42,43], and these processes may reflect this proposed sexual cycle. We note that there is some Hi-C signal between sequences belonging to different diploid units in Fig 6. We propose that this is the result of read mismapping, or compromised membranes between the two proposed nuclei. The latter may occur as a result of the fixation process in some of the cells, or as a genuine biological phenomenon. For instance, gaps between nuclei membranes have been noticed in two diplokarya [29,43]. Similarly, there is also evidence of Hi-C signal across nuclei in dikaryotic fungi [126].

Fig 10. Simplified proposed generalized lifecycle for Microsporidia.

Fig 10

Our proposed model posits that each nucleus is a diploid, and that microsporidian reproduction mirrors reproduction in Fungi with stages similar to karyogamy, plasmogamy, and a stable “heterokaryon” (known as a diplokaryon in Microsporidia). Importantly, both the diplokaryotic and monokaryotic phases are parasitic, and species may spend most of their lifecycle in one or the other phase, giving rise to “diploid” and “tetraploid” lineages. The figure was manually drawn using InkScape (version 1.2.2).

In this model, both diploids and tetraploids can function as infective agents (Fig 10) with mitotic capabilities, aligning with the previously proposed period of rapid clonal propagation by Corradi (2015) [52]. Support for this proposal also comes from life cycle observations. Several microsporidians exhibit complex life cycles, generating multiple spore types with different infective potentials from a variety of hosts. For example, diplokaryotic Edhazardia aedis spores infect adult mosquitoes, producing monokaryotic spores that go on to infect larvae, generating diplokaryotic spores [31]. Amblyospora connecticus cycles between mosquito and copepod hosts using monokaryotic and diplokaryotic spores for infecting each host, respectively [127]. This alternation of generations with different ploidies could explain why the four genomes in tetraploids remain intact, and have not been subjected to the usual processes of rediploidisation, as random assortment of genomes into the diploid would select for fully functional diploids whichever pair was combined.

In the proposed model, each diplokaryon nucleus is diploid, and their mitosis requires only pairing of two homologous chromosomes, as in any diploid. In the proposed tetraploid fusion, however, pairing and correct partitioning of the chromosomes to diploid daughter nuclei requires that all four homologous chromosomes are recognized and sorted coherently. In diploid cells, homologous chromosome pairing is mediated through sequence recognition. Minimally, the four copies of each chromosome in the proposed tetraploid fusion must carry some signature that means they can be recognized even if they diverge in sequence elsewhere such that they can no longer recombine.

Our model predicts that there should be diploid, “gametic” forms for all the tetraploid Microsporidia, and that tetraploid forms may exist for the described diploid lineages. We note that it remains unclear how ploidy maps on to life stages in most species, and this information is crucial to underpin the nature of microsporidian reproduction. Generation of single-cell, whole-genome data for Microsporidia, focusing especially on species with complex life cycles and multiple spore types would be highly informative.

While the diploidy of the microsporidian nucleus is broadly in line with morphological data, with most diploid species being monokaryotic, and most tetraploid species being diplokaryotic [53,54], there are two exceptions. Namely, tetraploid Agmasoma penaei, which possesses monokaryotic spores (suggesting each nucleus is tetraploid), and diploid Vittaforma corneae, which possesses diplokaryotic spores (suggesting each nucleus is haploid) [53,128]. Given the morphological descriptions are not from the same isolates used to predict their ploidy, it is possible that these species are polymorphic, as observed in Microsporidia with complex, alternating life cycles and multiple spore types [41,129]. Future sampling of those lineages in particular will be crucial to whether each nucleus is truly diploid in Microsporidia.

In this work, we generated 40 high-quality genome assemblies, found signals consistent with recombination, and provided evidence suggesting that microsporidians are likely autotetraploids. While the timing and number of polyploidisation events remains uncertain, we propose that tetraploidy is an ancient feature of Microsporidia, with diploid lineages representing “reduced” forms. Population-level whole-genome sequencing, combined with longitudinal imaging in nature and laboratory cultures, will be crucial to illuminate the true nature of polyploidy, and its relationship to the lifecycle in Microsporidia.

Materials and methods

Data

Samples sequenced as part of the DToL project [68] were processed by the Tree of Life core laboratory and the Scientific Operations core at the Wellcome Sanger Institute. This process typically relies on using different parts of one individual, or two closely related individuals, to generate long-read DNA sequencing data and Hi-C short-read sequencing data. Because of this, specimens identified as microsporidian-infected from long-read sequencing data did not often have Hi-C data available.

To ensure we were able to generate Hi-C data for a subset of the microsporidian genomes we explored, we sampled 650 individual flying insects on the Wellcome Genome Campus using a Malaise trap in the summer of 2023. These specimens were first bisected, and DNA for long-read sequencing was extracted from one half. The other half was kept at −80°C for later Hi-C sequencing. We identified infected insects by PCR amplification testing of the long-read DNA extracts using a microsporidia-specific amplicon locus targeting the small subunit (SSU) rRNA hypervariable V1-V3 regions [130,131]. The primers used were V1F and R30R, with standard protocol as outlined in the literature [130,131]. Six specimens were identified as microsporidia-positive, and library preparation and genome sequencing of those six individuals, with ToLIDs idDelPlat3.µ, idTanUsma1.µ, idChiSpeb1.µ, idDelPlat4.µ, idLucSpea1.µ, and idDelPlat5.µ, was performed by the Scientific Operations core at the Wellcome Sanger Institute.

BioSpecimen identifiers for each dataset used in this study are listed in the S1 Table, along with the microsporidian genome assemblies.

ToLIDs and OTUs

In the manuscript, we use Tree of Life identifiers (ToLIDs) of the host individuals that were sequenced at the Wellcome Sanger Institute, with the suffix “.µ”, to refer to the microsporidian genome assemblies that resulted from them (see https://id.tol.sanger.ac.uk/ for more information). The full IDs the microsporidian genome assemblies are released under are listed in S1 Table, and do not contain the suffix “.µ”. Additionally, we use “gm” as a prefix for each OTU, in line with ToLID notation (g for “Fungi”, and m for “Microsporidia”).

Genome assembly

We identified 34 individual specimens sequenced for DToL as likely to be infected with a microsporidian using a MarkerScan [72] screen of their preliminary genome assemblies (generated using hifiasm version 0.19.9 [132]). DToL genome assemblies were screened on a rolling basis as they were generated, with a total of 1,200 genome assemblies screened to identify the infected specimens. For each infected individual, we concatenated the primary and alternate preliminary genome assemblies.

To positively identify microsporidian sequences, BlobToolKit [133] was run on each concatenated preliminary genome assembly. Contigs were filtered (“Filtering Step 1” in S2 Table) by a combination of average read coverage, GC content, and taxonomic classification of contigs using upper and lower bounds that retained contigs that had BLASTx matches mostly to microsporidian proteins, and excluded contigs that had a majority of database matches to proteins from to other taxa. The specific filtering parameters used for each genome assembly are reported in S2 Table.

The PacBio single-molecule HiFi long reads belonging to the preliminary assembly were aligned to the filtered microsporidian contigs using minimap2 (version 2.28) [95], and aligned reads were isolated using samtools (version 1.19.2, MAPQ = 255) [97]. To assess read coverage and ploidy, a k-mer spectrum was generated for the isolated reads using Jellyfish (k = 21, version 2.2.10) [134] and analyzed using GenomeScope2 (version 2.0) [82] (File Collection 1 at https://doi.org/10.5281/zenodo.17251512). Where samples had high average read coverage (>20×) and reliable ploidy estimation, Smudgeplot (version 0.4.0 “Arched”) was also run to confirm the ploidy assessment [82] (File Collection 8 at https://doi.org/10.5281/zenodo.17251512). We note that GenomeScope2 and Smudgeplot rely on heterozygosity for ploidy estimation [82], and while individuals with low heterozygosity can still have their ploidy correctly estimated, individuals with exactly 0% heterozygosity, such as some haploid selfing eukaryotes, are likely to be mis-classified.

The isolated PacBio HiFi reads were then reassembled using hifiasm (version 0.19.9, parameter −l0 was used to disable purging) [132]. The contigs from the reassembly were filtered again (using https://github.com/Amjad-Khalaf/BubblePlot, “Filtering Step 2” in S2 Table) by average read depth, GC content, and taxonomic classification of contigs, with upper and lower bounds that retained all contigs that contained microsporidian BUSCO proteins (microsporidia_odb10, version 5.4.6) [76], and excluded contigs that contained proteins mostly belonging to other taxa. The specific filtering parameters used for each genome assembly are reported in S2 Table. Assembly quality was subsequently evaluated using MerquryFK (Github: https://github.com/thegenemyers/MERQURY.FK) (File Collection 9 at https://doi.org/10.5281/zenodo.17251512).

In cases where ploidy was successfully estimated from the isolated reads, the ploidy status of the cleaned reassembly was assigned, or marked as unresolved (“NA” in S1 Table).

If Hi-C data was available for the same individual that the long read data had been generated from, Hi-C reads were mapped to the cleaned reassembly using bwa-mem2 (version 2.2.1) [135], and the resulting alignment files were converted to contact maps using Juicer Tools (version 1.8.9) [136], bedtools (version 2.31.1) [96], and PretextMap (version 0.19) [137] (using the sanger-tol/curationpretext pipeline [102]). If Hi-C data provided sufficient signal, contigs were then manually scaffolded using PretextView (version 0.2.5) [103] (excluding multi-mapping Hi-C reads). A haploid representation of the genome assembly was generated by selecting the most contiguous copy of each chromosome with the least gaps. This was possible for seven microsporidian genome assemblies.

If Hi-C data was not available, but the cleaned reassembly possessed sufficient coverage to estimate ploidy using GenomeScope2 (version 2.0) and Smudgeplot (version 0.4.0 “Arched”) [82], purge_dups was used to generate a haploid representation of the genome assembly [80] (File Collection 10 at https://doi.org/10.5281/zenodo.17251512). This process was followed for eight microsporidian genome assemblies.

In the cases where Hi-C data was not available, or the cleaned reassembly did not possess sufficient coverage to estimate ploidy, or the cleaned reassembly showed high levels of rearrangements between its subgenomes (regardless of having been assigned a ploidy), purge_dups [80] was not run. For these assemblies, the unresolved cleaned reassembly was considered the final assembly and used in further analyses. This was the case for 27 microsporidian genome assemblies. The data available and the paths followed for each microsporidian genome assembly are outlined in S1 Table.

The genome assembly and read metrics for all intermediate steps for each microsporidian genome assembly are reported in File Collection 11 at https://doi.org/10.5281/zenodo.17251512. The genomes generated in this study are available in File Collection 12 at https://doi.org/10.5281/zenodo.17251512.

Confirming that microsporidian genomes are derived from single species

We employed several checks to implicitly confirm that our microsporidian genomes are derived from samples with single-species infections. Firstly, the MarkerScan screen relies on identifying microsporidian ribosomal sequences in the arthropod reads, and would have identified multiple species of microsporidia had they been present in the sample [72]. Secondly, BlobToolKit plots would have highlighted whether identified microsporidian sequences belonged to different species. This is because any two co-infecting organisms are unlikely to maintain identical loads in their hosts, and they are unlikely to have identical sequence composition. Similarly, a mixed infection (whether it be multiple species or multiple strains of the same species) would have been evident in the GenomeScope2 histograms. We have explored this in detail previously (see [53] for examples of mixed infections). Furthermore, in the case of polyploid genomes, our gene-based species-delineation analyses would have highlighted the presence of multiple species. Thus, we determined that all of our 40 microsporidian genomes are derived from single-species infections. However, we note that it is not possible to distinguish whether genomes with low coverage came from single, clonal infections or infections involving multiple, closely-related strains of the same species.

Publicly available microsporidian genome assemblies

On the 1st of January 2025, we downloaded all microsporidian genome assemblies available in the NCBI Genome database. This retrieved 106 genome assemblies, whose accession numbers are listed in S5 Table.

Alignment, phylogeny, and species delineation

Unless stated otherwise, all phylogenies were generated by identifying orthologous proteins using BUSCO (microsporidia_odb10, version 5.4.6.) [76], aligning each locus across all genomes using MAFFT (version 7.525) [90], inferring a tree for each of them using IQ-TREE (version 2.3.4) [77], and summarizing all the resulting gene trees into one species tree using ASTRAL (version 5.7.8) [75]. The branch lengths were subsequently estimated using a concatenated alignment of the individual BUSCOs used, with IQ-TREE [77]. The model chosen according to IQ-TREE’s model finder was “Q.yeast.I.G4”. In the case of multi-copy genes, one of the copies was chosen randomly. This was done because the majority of multi-copy genes across all the genomes showed that haplotypes displayed clear monophyly (see Results).

To infer species membership of unidentified genomes, we calculated average pairwise divergence between all possible genome pairs using the BUSCO gene phylogeny, and defined upper bounds for within-species divergence based on the genomes from identified microsporidian species from public data. We also explored generating a same-species threshold for each gene, using the distribution of branch lengths between genomes classified as belonging to the same species. Our results were consistent with those based on the whole-genome phylogeny branch lengths (see S4 and S5 Figs).

Other genomic approaches for species delineation, such as ANI, have been applied to Microsporidia [111]. However, we have elected to use amino acid substitutions per site on a BUSCO gene phylogeny because it relies on pre-established orthology, rather than the assumption that reciprocal blast hits represent orthologous sequences. Furthermore, many ANI approaches do not take into account the proportion of the genome that is represented in the reciprocal blast hits which are assumed to be orthologous. Thus, relying on amino acid substitutions per site in a universal set of orthologous genes like BUSCOs is less susceptible to noise.

Assessing if haplotypes coalesce prior to genomes

For every possible combination of any two tetraploid genomes from the list of complete genomes generated in this study, and all publicly available genomes, a tree topology test was performed using IQ-TREE (version 2.3.4) [77] on multi-copy BUSCO gene trees. Support for the topology where each genome’s haplotypes displayed clear monophyly was statistically assessed using the Approximately Unbiased Test [77,94]. Unpurged genome assemblies were used where available.

Inference of historical rediploidisation events

To infer potential historical de-polyploidisation events, wgd (version 2) was run on each genome from the list of complete genomes generated in this study, and all publicly available genomes, to infer paralogous genes and compute their Ks values [104].

Genome annotation

Each final microsporidian genome assembly was annotated for repeats using RepeatModeler (version 2.0.5) and RepeatMasker (version 4.1.7) [78,79]. Repeat annotations for each genome assembly are reported in File Collection 13 at https://doi.org/10.5281/zenodo.17251512. We also annotated each final genome assembly for coding sequences using BRAKER2 [138], with protein hints from UniProt [139]. The resultant GeneMark-ES [140] annotation files can be found in File Collection 14 at https://doi.org/10.5281/zenodo.17251512.

Between-haplotype rearrangements

To illustrate between-haplotype rearrangements in some of our genomes, we developed a greedy algorithm that bins each of our unpurged tetraploid genomes into four subgenomes based on BUSCO gene completion and duplication (see Github: https://github.com/Amjad-Khalaf/gerbil for implementation). After sorting contigs by from largest to smallest, our algorithm iterates through contigs and assigns a contig to a subgenome bin if the duplication that contig would add to its proposed subgenome bin does not exceed a specified threshold.

The discussed patterns can be highlighted by examining synteny between the recovered haplotypes for one of the duplication thresholds tested (the x-axis in Fig 8). For idChiSpeb1.µ and ilAceEphe1.µ, we plotted the largest 10 contigs in haplotype 1 with their BUSCO genes, and coloured these genes in the other haplotypes by their positions in haplotype 1 (Fig 8). As expected, the haplotypes of idChiSpeb1.µ show nearly perfect synteny with one another. However, ilAceEphe1.µ display a large number of rearrangements and haplotype-unique BUSCO genes (as seen by the pattern in Fig 8). In line with this, nearly all of idChiSpeb1.µ’s BUSCO genes are in 4 copies, distributed across 4 haplotypes (S6 Fig). On the other hand, the vast majority of ilAceEphe1.µ’s BUSCO genes are in less than 4 copies (despite its haploid coverage exceeding 48X, as seen in File Collection 1 at https://doi.org/10.5281/zenodo.17251512), and its BUSCO genes are not evenly distributed across its haplotypes. For instance, some BUSCO genes occur in 3 copies, all present in a single haplotype (S6 Fig). Together, these observations highlight the degree of rearrangement and unevenness in ilAceEphe1.µ, and similar scenarios are seen for ilMytImpu1.µ and ihCicViri2.µ.

Between-genome rearrangements

We mapped the relative position of conserved orthologous BUSCO genes in all chromosome-level genome assemblies, using ribbon plot scripts from https://github.com/conchoecia/odp [98], to compare synteny patterns in Microsporidia. To quantify and synthesize the observed patterns with an evolutionary perspective, we also ran syngraph to infer a set of putative ancestral linkage groups for these genomes [141]. Additionally, we ran an unsupervised clustering of loci based on their chromosomal occupancy [142,143], also aiming to highlight any recoverable putative ancestral linkage groups.

Supporting information

S1 Table. Microsporidian genome assembly statistics and host metadata.

Full list of recovered microsporidian genome assemblies, their associated metadata, and host metadata.

(XLSX)

pbio.3003446.s001.xlsx (40.1KB, xlsx)
S2 Table. Filtering parameters used in generating genome assemblies.

Parameters used for filtering microsporidian contigs from their respective (meta-)genomic assemblies in filtering steps 1 (BlobToolKit [133]) and 2 (BubblePlot, Github: https://github.com/Amjad-Khalaf/BubblePlot). See Materials and methods for details.

(XLSX)

pbio.3003446.s002.xlsx (7.4KB, xlsx)
S3 Table. Trait-phylogeny regression.

Transformations representing the fit with the tree’s topology (λ), branch-lengths (κ), and root–tip distance (δ) [83] and the number of coding sequences, transposable element loads, and genome spans.

(XLSX)

pbio.3003446.s003.xlsx (7.8KB, xlsx)
S4 Table. Trait correlation.

Correlations between transposable element loads and genome spans.

(XLSX)

pbio.3003446.s004.xlsx (7.9KB, xlsx)
S5 Table. Accession numbers for publicly available genomes used in this study.

On the 1st of January 2025, we downloaded all microsporidian genome assemblies available in the NCBI Genome database. This retrieved the following 106 genome assemblies.

(XLSX)

pbio.3003446.s005.xlsx (6.7KB, xlsx)
S6 Table. Branch length distances for species delineation.

Pairwise branch length distances which include one of our genomes, and can be classified to a species or a genus. The conservative branch length threshold range was defined using the shortest observed branch lengths between known same-species genomes for the lower bound (0) and the smallest distance between H. tvaerminnensis and H. magnivora genomes for the upper bound (0.012). The relaxed threshold uses the full range of observed branch lengths among known same-species genomes (excluding the H. tvaerminnensisH. magnivora cutoff).

(XLSX)

pbio.3003446.s006.xlsx (16.8KB, xlsx)
S1 Text. Newick string of phylogeny.

ASTRAL [75] phylogeny summarizing individual phylogenies of 600 BUSCO genes (microsporidia_odb10) [76] across all publicly available microsporidian genome assemblies (n = 106), and the genome assemblies generated in this study (n = 40, marked in purple). Branch lengths were estimated with IQ-TREE using a concatenated alignment of the individual BUSCOs [77]. The model chosen according to IQ-TREE’s model finder was “Q.yeast.I.G4”.

(TXT)

pbio.3003446.s007.txt (8.4KB, txt)
S2 Text. Approximately Unbiased phylogenetic test results.

BUSCO (microsporidia_odb10, version 5.4.6) (Simão and colleagues 2015) was run on the unpurged genome assemblies of the tetraploid genomes. The haplotypes of each BUSCO locus were aligned to one another using MAFFT (version 7.525) (Katoh and colleagues 2002), and a phylogeny was generated for each alignment using IQ-TREE (version 2.3.4, with ModelFinder enabled and 1,000 bootstrap replicates) (Minh and colleagues 2020; Kalyaanamoorthy and colleagues 2017). The Approximately Unbiased statistical test [90] on the multi-copy BUSCO gene phylogenies for all pairwise combinations of tetraploid microsporidian genomes. The high-level summary of these pairwise tests are included in this text. For each listed pairwise comparison, the “+” sign indicates the number of phylogenies where haplotypes coalesce more recently than species, and the “−” sign indicates the number of phylogenies where species coalesce more recently than haplotypes.

(TXT)

pbio.3003446.s008.txt (33.8KB, txt)
S3 Text. Details on rearrangements inferred and methods attempted.

(TXT)

pbio.3003446.s009.txt (2.1KB, txt)
S1 Script. Plotting histograms depicting phylogenetic branch lengths (in amino acid substitutions per site) between homeologous gene pairs for 13 tetraploid genomes.

Python script used to extract pairwise branch lengths between homeologous gene pairs for 13 tetraploid genomes, and plot them as histograms. Please note that the following genomes are represented by deprecated ToLIDs, which differ from the ones used in this manuscript. iyOecSmar33: idDelPlat3; iyOecSmar35: idTanUsma1; iyOecSmar39: idChiSpeb1; iyOecSmar41: idDelPlat4; and iyOecSmar44: idDelPlat5.

(PY)

pbio.3003446.s010.py (3.3KB, py)
S1 Fig. Sex of hosts the microsporidian genome assemblies are derived from.

The sex of our genomes’ hosts was unknown in most cases (24 species). In the remaining cases, nine were identified as female and seven as male. A relatively equal proportion of female and male hosts are infected with Nosematida (Fig 1), but we could not assess skews in host sex ratios for other microsporidian groups due to missing data on sex (for Amblyosporida-infected hosts), or a small sample size (for Neopereziida-infected hosts). The data underlying this figure can be found in S1 Table. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s011.png (14.9MB, png)
S2 Fig. Ploidy inference examples for three microsporidian genomes, highlighting segmental duplications.

GenomeScope2 transformed linear plot and Smudgeplot [82], respectively, for (A), (B) diploid iyCepSpine2.µ (host Cephus spinipes [Hymenoptera]); (C), (D) diploid idPhaFune2.µ (host Phania funesta [Diptera]); and (E), (F) polyploid (tetraploid or octoploid) iiMysAzur1.µ (host Mystacides azureus [Trichoptera]). Jellyfish was used to generate the initial k-mer spectra (k = 21, version 2.2.10) [134]. Both iyCepSpin2.µ and idPhaFune2.µ have mostly diploid genomes, but carry a level of duplication that generated an identifiable “tetraploid” signal in their k-mer spectra. Similarly, the k-mer spectrum of iiMysAzur1.µ can be interpreted as either a highly homozygous tetraploid where large segmental duplications have occurred in all the four copies leading to a detectable octoploid signal, or an octoploid genome composed of two distinct tetraploids. Such cases are common, with some level of segmental duplication observed in nearly all of the 14 polyploid genomes (refer to File Collection 1 at https://doi.org/10.5281/zenodo.17251512). The genomes used to generate this figure can be found in File Collection 12 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using GenomeScope2 [82], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s012.png (3.5MB, png)
S3 Fig. 600 Gene Phylogeny of Microsporidia.

(A) ASTRAL [75] phylogeny summarizing individual phylogenies of 600 BUSCO genes (microsporidia_odb10) [76] across all publicly available microsporidian genome assemblies (including multiple strains where they are available), and the genome assemblies generated in this study (n = 40, marked in purple). Branch lengths were estimated with IQ-TREE using a concatenated alignment of the individual BUSCOs [77]. Nodes with less than 95% support are marked with pink circles. Ploidy is marked in circles at the tips of the tree for genomes where it was characterizable. (B) Genome assembly span (Mb) as calculated by assembly-stats (Github: https://github.com/sanger-pathogens/assembly-stats), with black circles marking chromosome-level genome assemblies. (C) N50 values (Mb) as calculated by assembly-stats (Github: https://github.com/sanger-pathogens/assembly-stats), with asterisks marking purged genome assemblies. (D) BUSCO gene (microsporidia_odb10) completeness percentage, marked in green for single-copy genes, and beige for duplicated genes. (E) Transposable element percentage as predicted by RepeatModeler and RepeatMasker [78,79], marked in burgundy for retroelements, peach for DNA transposons, and blue for rolling circles. Neop.: Neopereziida; Or. Lin.: Orphan Lineage. The data underlying A can be found in S1 Text. The data underlying B, C, D, and E can be found in S1 Table. The figure was generated using ToyTree [73], and manually annotated using InkScape (version 1.2.2).

(ZIP)

pbio.3003446.s013.zip (5.8MB, zip)
S4 Fig. Comparison of whole-genome phylogeny species delineation thresholds and individual gene phylogeny branch length distribution species delineation thresholds.

The approach we presented in the main text relies on branch lengths derived from the whole-genome phylogeny in Fig 2 (i.e., a concatenated supermatrix of genes). We re-estimated same-species branch length thresholds for each gene. For each gene, we used the distribution of branch lengths between genomes known to belong to the same species, and measured each distribution’s mean and 95th percentile. The upper threshold was then set by retrieving the highest observed 95th percentile (orange dashed line) and the highest observed mean (magenta dashed line). While the percentage of genes exceeding each threshold varies for each genome, they are relatively consistent, and lead to the same OTU assignment and the same conclusions when investigating tetraploid species. ilAceEphe1.µ still stands out as possessing more genes which exceed the same-species threshold (no matter what threshold was used) than other genomes. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s014.png (215.6KB, png)
S5 Fig. Relationship between whole-genome phylogeny species delineation thresholds and individual gene phylogeny branch length distribution species delineation thresholds.

We compared our two gene-based metrics (highest 95th percentile and highest mean of branch length distributions of individual gene trees for genomes known to belong to the same species) to the whole-genome-based metric (highest branch length observed between any two same species genomes). We found the relationship between them to be consistent and linear, in line with the fact that they lead to the same conclusions. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s015.png (118.2KB, png)
S6 Fig. Tetraploid ilAceEphe1.µ is uneven and rearranged.

The number of BUSCO genes found in X haplotypes, along with their total copy number. idChiSpeb1.µ is an even tetraploid, so nearly all its BUSCO genes are in 4 copies, distributed across 4 haplotypes. On the other hand, ilAceEphe1.µ is an uneven tetraploid. The majority of its BUSCO genes are in less than 4 copies, and they are not evenly distributed across its haplotypes. For instance, some BUSCO genes occur in 3 copies present only in a single haplotype. The figure was generated using gerbil (Github: https://github.com/Amjad-Khalaf/gerbil), and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s016.png (164KB, png)
S7 Fig. Phylogeny used by Syngraph, with its internal node labeling.

Each node is labeled with its Syngraph name in a gray box. Yellow boxes indicate the number of chromosomes each genome possesses, and blue boxes indicate the number of chromosomes which possess BUSCO gene markers. The figure was generated using ToyTree [73], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s017.png (380.2KB, png)
S8 Fig. Number of chromosomes inferred at each node is highly variable.

The number of chromosomes inferred for each node, and the total number of BUSCO genes assigned to a chromosome for each “m.” “m” is the parameter in Syngraph to determine the minimum number of genes needed to travel together for the event to be counted as a rearrangement. For example, if m = 3, only rearrangements involving 3 or more genes will be counted. Deep nodes are highly variable and their karyotype (and thus the number of rearrangements that have occurred along each branch) cannot be estimated reliably. See S7 Fig for node labels on the phylogeny. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s018.png (577.5KB, png)
S9 Fig. T-SNE plot depicting BUSCO linkage groups across the microsporidian phylogeny.

Each point represents a BUSCO gene, positioned based on its co-occurrence profile across the chromosome-level microsporidian genomes. Distances between points reflect similarities in co-occurrence. Points are coloured by their assigned chromosome in Anotonspora locustae. This disorganized pattern illustrates that the rate of rearrangement is too high for a reliable complete reconstruction of putative ancestral linkage groups. The large-scale patterns are influenced by more densely sampled taxa, see S10 Fig. The data underlying this figure can be found in File Collection 5 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using Scikit-learn [142,143] and Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s019.png (124.8KB, png)
S10 Fig. T-SNE plot depicting BUSCO linkage groups across the microsporidian phylogeny, highlighting clustering influence by more densely sampled taxa.

Each point represents a BUSCO gene, positioned based on its co-occurrence profile across the chromosome-level microsporidian genomes. Distances between points reflect similarities in co-occurrence. Points are coloured by their assigned chromosome in Encephalitozoon cuniculi. This disorganized pattern illustrates that the rate of rearrangement is too high for a reliable complete reconstruction of putative ancestral linkage groups. The large-scale patterns are influenced by more densely sampled taxa, such as Encephalitozoon cuniculi. The data underlying this figure can be found in File Collection 5 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using Scikit-learn [142,143] and Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s020.png (108.8KB, png)
S11 Fig. Synteny plots of chromosomal microsporidian genome assemblies.

Genome-wide synteny plots of all available chromosomal microsporidian genome assemblies. Each line represents a single-copy BUSCO (microsporidia_odb10) [76]. BUSCOs are painted by their chromosomal position in A. locustae. The data underlying this figure can be found in File Collection 5 at https://doi.org/10.5281/zenodo.17251512. Figure was generated by using ribbon plot scripts from https://github.com/conchoecia/odp [109] and ToyTree [73], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s021.png (27.2MB, png)
S12 Fig. Synteny plots of chromosomal microsporidian genome assemblies.

Genome-wide synteny plots of all available chromosomal microsporidian genome assemblies. Each line represents a single-copy BUSCO (microsporidia_odb10) [76]. BUSCOs are painted by their chromosomal position in H. tvaerminnensis. Figure was generated by using ribbon plot scripts from https://github.com/conchoecia/odp [109] and ToyTree [73], and manually annotated using InkScape (version 1.2.2).

(PNG)

pbio.3003446.s022.png (26.2MB, png)

Acknowledgments

We thank Dr Lewis Stevens, Dr Jamie Bojko, and Dr Yuliya Y Sokolova for their insight, their kindness, and generosity with their time in discussing these results with us over the last year. We also warmly thank our colleagues at the Tree of Life Programme, Wellcome Sanger Institute for their support, and comradery.

Abbreviations

ANI

Average Nucleotide Identity

BUSCO

Benchmarking Using Single Copy Orthologues

DToL

Darwin Tree of Life

OTU

operational taxonomic unit

ToLIDs

Tree of Life identifiers

WGD

whole genome duplication

Data Availability

All relevant data and Supporting information are available in full on Zenodo https://doi.org/10.5281/zenodo.17251512. Small supporting figures and tables were attached to submission as well.

Funding Statement

This work was funded in whole by the Wellcome Trust (grant number 220540/Z/20/A). Funding was awarded to authors MKNL and MB. The funders did NOT play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. Wellcome Trust URL: https://wellcome.org/.

References

  • 1.Keeling P. Five questions about microsporidia. PLoS Pathog. 2009;5(9):e1000489. doi: 10.1371/journal.ppat.1000489 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Nageli C. uber die neue Krankheit der Seidenraupe und verwandte Organismen. [Abstract of report before 33. Versamml. Deutsch. Naturf. u. Aerzte. Bonn, 21 Sept.]. Bot Ztg. 1857;15: 760–761. Available from: https://cir.nii.ac.jp/crid/1573105974684833920 [Google Scholar]
  • 3.Pasteur L. Etudes sur la maladie des vers à soie: 2.: Notes et documents. Gauthier-Villars; 1870. Available from: https://play.google.com/store/books/details?id=y-1rmRQoAa4C
  • 4.Bojko J, Reinke AW, Stentiford GD, Williams B, Rogers MSJ, Bass D. Microsporidia: a new taxonomic, evolutionary, and ecological synthesis. Trends Parasitol. 2022;38(8):642–59. doi: 10.1016/j.pt.2022.05.007 [DOI] [PubMed] [Google Scholar]
  • 5.Stentiford GD, Feist SW, Stone DM, Bateman KS, Dunn AM. Microsporidia: diverse, dynamic, and emergent pathogens in aquatic systems. Trends Parasitol. 2013;29(11):567–78. doi: 10.1016/j.pt.2013.08.005 [DOI] [PubMed] [Google Scholar]
  • 6.Han B, Pan G, Weiss LM. Microsporidiosis in humans. Clin Microbiol Rev. 2021;34(4):e0001020. doi: 10.1128/CMR.00010-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Becnel JJ, Weiss LM. Publication: USDA ARS. ars.usda.gov; 2014 [cited 12 Dec 2023]. Available from: https://www.ars.usda.gov/research/publications/publication/?seqNo115=310041
  • 8.Weber R, Bryan RT, Schwartz DA, Owen RL. Human microsporidial infections. Clin Microbiol Rev. 1994;7(4):426–61. doi: 10.1128/CMR.7.4.426 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bryan RT, Cali A, Owen RL, Spencer HC. Microsporidia: opportunistic pathogens in patients with AIDS. Prog Clin Parasitol. 1991;2: 1–26. Available from: https://www.ncbi.nlm.nih.gov/pubmed/1893116 [PubMed] [Google Scholar]
  • 10.Matsubayashi H, Koike T, Mikata I, Takei H, Hagiwara S. A case of Encephalitozoon-like body infection in man. AMA Arch Pathol. 1959;67: 181–187. Available from: https://www.ncbi.nlm.nih.gov/pubmed/13616827 [PubMed] [Google Scholar]
  • 11.Bojko J, Stentiford GD. Microsporidian pathogens of aquatic animals. Exp Suppl. 2022;114:247–83. doi: 10.1007/978-3-030-93306-7_10 [DOI] [PubMed] [Google Scholar]
  • 12.Thomas SR, Elkinton JS. Pathogenicity and virulence. J Invertebr Pathol. 2004;85(3):146–51. doi: 10.1016/j.jip.2004.01.006 [DOI] [PubMed] [Google Scholar]
  • 13.Anderson DL, Giacon H. Reduced pollen collection by honey bee (Hymenoptera: Apidae) colonies infected with Nosema apis and sacbrood virus. J Econ Entomol. 1992;85(1):47–51. doi: 10.1093/jee/85.1.47 [DOI] [Google Scholar]
  • 14.Fries I, Ekbohm G, Villumstad E. Nosema apis, sampling techniques and honey yield. J Apic Res. 1984;23(2):102–5. doi: 10.1080/00218839.1984.11100617 [DOI] [Google Scholar]
  • 15.Vyas-Patel N. The Suppression of Plasmodium berghei in Anopheles coluzzii infected later with Vavraia culicis. Cold Spring Harbor Laboratory; 2023. doi: 10.1101/2023.02.05.527158 [DOI] [Google Scholar]
  • 16.Akorli J, Akorli EA, Tetteh SNA, Amlalo GK, Opoku M, Pwalia R, et al. Microsporidia MB is found predominantly associated with Anopheles gambiae s.s and Anopheles coluzzii in Ghana. Sci Rep. 2021;11(1):18658. doi: 10.1038/s41598-021-98268-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Herren JK, Mbaisi L, Mararo E, Makhulu EE, Mobegi VA, Butungi H, et al. A microsporidian impairs Plasmodium falciparum transmission in Anopheles arabiensis mosquitoes. Nat Commun. 2020;11(1):2187. doi: 10.1038/s41467-020-16121-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lee SC, Heitman J, Ironside JE. Sex and the Microsporidia. Microsporidia: pathogens of oppurtunity; 2014. Available from: https://books.google.com/books?hl=en&lr=&id=k4kZBAAAQBAJ&oi=fnd&pg=PA231&dq=microsporidia+ploidy&ots=oqF0tRPXyc&sig=Ua18f4MX3vgjvbaipK2DYB1v3Kc
  • 19.Amigó JM, Gracia MP, Salvadó H, Vivarés CP. Pulsed field gel electrophoresis of three microsporidian parasites of fish. Acta Protozool. 2002;41:11–6. [Google Scholar]
  • 20.Bernander R, Palm JE, Svärd SG. Genome ploidy in different stages of the Giardia lamblia life cycle. Cell Microbiol. 2001;3(1):55–62. doi: 10.1046/j.1462-5822.2001.00094.x [DOI] [PubMed] [Google Scholar]
  • 21.Vávra J. Development of the microsporidia. Biology of the Microsporidia. Springer US; 1976. p. 87–109. doi: 10.1007/978-1-4684-3114-8_3 [DOI] [Google Scholar]
  • 22.Youssef NN, Hammond DM. The fine structure of the developmental stages of the microsporidian Nosema apis Zander. Tissue Cell. 1971;3(2):283–94. doi: 10.1016/s0040-8166(71)80023-x [DOI] [PubMed] [Google Scholar]
  • 23.Cali A. Morphogenesis in the genus Nosema. Proc IVth Int Colloq Insect Pathol. 1971;4:104–12. [Google Scholar]
  • 24.Sprague V, Vernick SH. The ultrastructure of Encephalitozoon cuniculi (Microsporida, Nosematidae) and its taxonomic significance. J Protozool. 1971;18(4):560–9. doi: 10.1111/j.1550-7408.1971.tb03376.x [DOI] [PubMed] [Google Scholar]
  • 25.Miquel J, Kacem H, Baz-González E, Foronda P, Marchand B. Ultrastructural and molecular study of the microsporidian Toguebayea baccigeri n. gen., n. sp., a hyperparasite of the digenean trematode Bacciger israelensis (Faustulidae), a parasite of Boops boops (Teleostei, Sparidae). Parasite. 2022;29:2. doi: 10.1051/parasite/2022007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Pretto T, Montesi F, Ghia D, Berton V, Abbadi M, Gastaldelli M, et al. Ultrastructural and molecular characterization of Vairimorpha austropotamobii sp. nov. (Microsporidia: Burenellidae) and Thelohania contejeani (Microsporidia: Thelohaniidae), two parasites of the white-clawed crayfish, Austropotamobius pallipes complex (Decapoda: Astacidae). J Invertebr Pathol. 2018;151:59–75. doi: 10.1016/j.jip.2017.11.002 [DOI] [PubMed] [Google Scholar]
  • 27.Maurand J, Vey A. Histopathological and ultrastructural studies of Theolobania contejeani (Microsporida, Nosematidae) a parasite of the crayfish Austropotamobius pallipes Lereboullet. Ann Parasitol Hum Comp. 1973;48(3):411–21. doi: 10.1051/parasite/1973483411 [DOI] [PubMed] [Google Scholar]
  • 28.Weidner E. Ultrastructural study of microsporidian development. Z Zellforsch. 1970;105(1):33–54. doi: 10.1007/bf00340563 [DOI] [PubMed] [Google Scholar]
  • 29.Sokolova YY, Fuxa JR. Biology and life-cycle of the microsporidium Kneallhazia solenopsae Knell Allan Hazard 1977 gen. n., comb. n., from the fire ant Solenopsis invicta. Parasitology. 2008;135(8):903–29. doi: 10.1017/S003118200800440X [DOI] [PubMed] [Google Scholar]
  • 30.Becnel JJ. Horizontal transmission and subsequent development of Amblyospora californica (Microsporida: Amblyosporidae) in the intermediate and definitive hosts. Dis Aquat Organ. 1992;13: 17–28. Available from: https://www.int-res.com/articles/dao/13/d013p017.pdf [Google Scholar]
  • 31.Becnel JJ, Sprague V, Fukuda T, Hazard EI. Development of Edhazardia aedis (Kudo, 1930) n. g., n. comb. (Microsporida: Amblyosporidae) in the mosquito Aedes aegypti (L.) (Diptera: Culicidae). J Protozool. 1989;36(2):119–30. doi: 10.1111/j.1550-7408.1989.tb01057.x [DOI] [PubMed] [Google Scholar]
  • 32.Canning EU. Nuclear division and chromosome cycle in microsporidia. Biosystems. 1988;21(3–4):333–40. doi: 10.1016/0303-2647(88)90030-5 [DOI] [PubMed] [Google Scholar]
  • 33.Becnel JJ, Hazard EI, Fukuda T, Sprague V. Life cycle of Culicospora magna (Kudo, 1920) (Microsporida: Culicosporidae) in Culex restuans Theobald with special reference to sexuality. J Protozool. 1987;34(3):313–22. doi: 10.1111/j.1550-7408.1987.tb03182.x [DOI] [Google Scholar]
  • 34.Hazard EI, Fukuda T, Becnel JJ. Gametogenesis and plasmogamy in certain species of Microspora. J Invertebr Pathol. 1985;46(1):63–9. doi: 10.1016/0022-2011(85)90130-2 [DOI] [Google Scholar]
  • 35.Hazard EI, Brookbank JW. Karyogamy and meiosis in an Amblyospora sp. (Microspora) in the mosquito Culex salinarius. J Invertebr Pathol. 1984;44(1):3–11. doi: 10.1016/0022-2011(84)90039-9 [DOI] [Google Scholar]
  • 36.Hazard EI, Andreadis TG, Joslyn DJ, Ellis EA. Meiosis and its implications in the life cycles of Amblyospora and Parathelohania (Microspora). J Parasitol. 1979;65(1):117. doi: 10.2307/3280215 [DOI] [Google Scholar]
  • 37.Vivares CP, Sprague V. The fine structure of Ameson pulvis (Microspora, Microsporida) and its implications regarding classification and chromosome cycle. J Invertebr Pathol. 1979;33(1):40–52. doi: 10.1016/0022-2011(79)90128-9 [DOI] [Google Scholar]
  • 38.Loubès C. Meiosis in Microsporidia: effects on biological cycles. J Protozool. 1979;26(2):200–8. doi: 10.1111/j.1550-7408.1979.tb02761.x [DOI] [PubMed] [Google Scholar]
  • 39.Loubès C, Maurand J, Rousset-Galangau V. Presence of synaptonematic complexes in the biological cycle of Gurleya chironomi Loubes and Maurand, 1975: an argument in favor of sexuality in microsporidia. C R Acad Hebd Seances Acad Sci D. 1976;282(10):1025–7. [PubMed] [Google Scholar]
  • 40.Desportes I. Ultrastructure de Stempellia mutabilis Leger et Hesse, microsporidie parasite de l’éphémère Ephemera vulgata L.; 1976. Available from: https://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=PASCAL7750051096
  • 41.Cali A, Becnel JJ, Takvorian PM. Microsporidia. Handbook of the protists. Springer International Publishing; 2017. p. 1559–618. doi: 10.1007/978-3-319-28149-0_27 [DOI] [Google Scholar]
  • 42.Sokolova YY, Dolgikh VV, Morzhina EV, Nassonova ES, Issi IV, Terry RS, et al. Establishment of the new genus Paranosema based on the ultrastructure and molecular phylogeny of the type species Paranosema grylli Gen. Nov., Comb. Nov. (Sokolova, Selezniov, Dolgikh, Issi 1994), from the cricket Gryllus bimaculatus Deg. J Invertebr Pathol. 2003;84(3):159–72. doi: 10.1016/j.jip.2003.10.004 [DOI] [PubMed] [Google Scholar]
  • 43.Nassonova ES, Smirnov AV. Synaptonemal complexes as evidence for meiosis in the life cycle of the monomorphic diplokaryotic microsporidium Paranosema grylli. Eur J Protistol. 2005;41(3):175–81. doi: 10.1016/j.ejop.2005.02.001 [DOI] [Google Scholar]
  • 44.Angst P, Ebert D, Fields PD. Demographic history shapes genomic variation in an intracellular parasite with a wide geographical distribution. Mol Ecol. 2022;31(9):2528–44. doi: 10.1111/mec.16419 [DOI] [PubMed] [Google Scholar]
  • 45.Sagastume S, Martín-Hernández R, Higes M, Henriques-Gil N. Genotype diversity in the honey bee parasite Nosema ceranae: multi-strain isolates, cryptic sex or both?. BMC Evol Biol. 2016;16(1):216. doi: 10.1186/s12862-016-0797-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Krebes L, Zeidler L, Frankowski J, Bastrop R. (Cryptic) sex in the microsporidian Nosema granulosis—evidence from parasite rDNA and host mitochondrial DNA. Infect Genet Evol. 2014;21:259–68. doi: 10.1016/j.meegid.2013.11.007 [DOI] [PubMed] [Google Scholar]
  • 47.Cuomo CA, Desjardins CA, Bakowski MA, Goldberg J, Ma AT, Becnel JJ, et al. Microsporidian genome analysis reveals evolutionary strategies for obligate intracellular growth. Genome Res. 2012;22(12):2478–88. doi: 10.1101/gr.142802.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Haag KL, Traunecker E, Ebert D. Single-nucleotide polymorphisms of two closely related microsporidian parasites suggest a clonal population expansion after the last glaciation. Mol Ecol. 2013;22(2):314–26. doi: 10.1111/mec.12126 [DOI] [PubMed] [Google Scholar]
  • 49.Ironside JE. Diversity and recombination of dispersed ribosomal DNA and protein coding genes in microsporidia. PLoS One. 2013;8(2):e55878. doi: 10.1371/journal.pone.0055878 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Sagastume S, Del Águila C, Martín-Hernández R, Higes M, Henriques-Gil N. Polymorphism and recombination for rDNA in the putatively asexual microsporidian Nosema ceranae, a pathogen of honeybees. Environ Microbiol. 2011;13(1):84–95. doi: 10.1111/j.1462-2920.2010.02311.x [DOI] [PubMed] [Google Scholar]
  • 51.Pelin A, Selman M, Aris-Brosou S, Farinelli L, Corradi N. Genome analyses suggest the presence of polyploidy and recent human-driven expansions in eight global populations of the honeybee pathogen Nosema ceranae. Environ Microbiol. 2015;17(11):4443–58. doi: 10.1111/1462-2920.12883 [DOI] [PubMed] [Google Scholar]
  • 52.Corradi N. Microsporidia: eukaryotic intracellular parasites shaped by gene loss and horizontal gene transfers. Annu Rev Microbiol. 2015;69:167–83. doi: 10.1146/annurev-micro-091014-104136 [DOI] [PubMed] [Google Scholar]
  • 53.Khalaf A, Lawniczak MKN, Blaxter ML, Jaron KS. Polyploidy is widespread in Microsporidia. Microbiol Spectr. 2024;12(2):e0366923. doi: 10.1128/spectrum.03669-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Stratton CE, Bolds SA, Reisinger LS, Behringer DC, Khalaf A, Bojko J. Microsporidia and invertebrate hosts: genome-informed taxonomy surrounding a new lineage of crayfish-infecting Nosema spp. (Nosematida). Fungal Divers. 2024;128(1):167–90. doi: 10.1007/s13225-024-00543-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Sokolova YY, Overstreet RM. A new microsporidium, Apotaspora heleios n. g., n. sp., from the Rverine grass shrimp Palaemonetes paludosus (Decapoda: Caridea: Palaemonidae). J Invertebr Pathol. 2018;157:125–35. doi: 10.1016/j.jip.2018.05.007 [DOI] [PubMed] [Google Scholar]
  • 56.Otto SP. The evolutionary consequences of polyploidy. Cell. 2007;131(3):452–62. doi: 10.1016/j.cell.2007.10.022 [DOI] [PubMed] [Google Scholar]
  • 57.Spoelhof JP, Soltis PS, Soltis DE. Pure polyploidy: closing the gaps in autopolyploid research. J of Syt Evol. 2017;55(4):340–52. doi: 10.1111/jse.12253 [DOI] [Google Scholar]
  • 58.Redmond AK, Casey D, Gundappa MK, Macqueen DJ, McLysaght A. Independent rediploidization masks shared whole genome duplication in the sturgeon-paddlefish ancestor. Nat Commun. 2023;14(1):2879. doi: 10.1038/s41467-023-38714-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Li Z, McKibben MTW, Finch GS, Blischak PD, Sutherland BL, Barker MS. Patterns and processes of diploidization in land plants. Annu Rev Plant Biol. 2021;72:387–410. doi: 10.1146/annurev-arplant-050718-100344 [DOI] [PubMed] [Google Scholar]
  • 60.Du K, Stöck M, Kneitz S, Klopp C, Woltering JM, Adolfi MC, et al. The sterlet sturgeon genome sequence and the mechanisms of segmental rediploidization. Nat Ecol Evol. 2020;4(6):841–52. doi: 10.1038/s41559-020-1166-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Mandáková T, Lysak MA. Post-polyploid diploidization and diversification through dysploid changes. Curr Opin Plant Biol. 2018;42:55–65. doi: 10.1016/j.pbi.2018.03.001 [DOI] [PubMed] [Google Scholar]
  • 62.Robertson FM, Gundappa MK, Grammes F, Hvidsten TR, Redmond AK, Lien S, et al. Lineage-specific rediploidization is a mechanism to explain time-lags between genome duplication and evolutionary diversification. Genome Biol. 2017;18(1):111. doi: 10.1186/s13059-017-1241-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Conant GC, Birchler JA, Pires JC. Dosage, duplication, and diploidization: clarifying the interplay of multiple models for duplicate gene evolution over time. Curr Opin Plant Biol. 2014;19:91–8. doi: 10.1016/j.pbi.2014.05.008 [DOI] [PubMed] [Google Scholar]
  • 64.Hokamp K, McLysaght A, Wolfe KH. The 2R hypothesis and the human genome sequence. J Struct Funct Genomics. 2003;3(1–4):95–110. [PubMed] [Google Scholar]
  • 65.Furlong RF, Holland PWH. Were vertebrates octoploid?. Philos Trans R Soc Lond B Biol Sci. 2002;357(1420):531–44. doi: 10.1098/rstb.2001.1035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Wolfe KH. Yesterday’s polyploids and the mystery of diploidization. Nat Rev Genet. 2001;2(5):333–41. doi: 10.1038/35072009 [DOI] [PubMed] [Google Scholar]
  • 67.Khalaf A, Francis O, Blaxter ML. Genome evolution in intracellular parasites: Microsporidia and Apicomplexa. J Eukaryot Microbiol. 2024;71(5):e13033. doi: 10.1111/jeu.13033 [DOI] [PubMed] [Google Scholar]
  • 68.Darwin Tree of Life Project Consortium. Sequence locally, think globally: the Darwin tree of life project. Proc Natl Acad Sci U S A. 2022;119(4):e2115642118. doi: 10.1073/pnas.2115642118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Weber CC, Paulini M, Blaxter ML. Kudoagenomes from contaminated hosts reveal extensive gene order conservation and rapid sequence evolution. Cold Spring Harbor Laboratory; 2024. doi: 10.1101/2024.11.01.621499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Vancaester E, Blaxter M. Phylogenomic analysis of Wolbachia genomes from the Darwin Tree of Life biodiversity genomics project. PLoS Biol. 2023;21(1):e3001972. doi: 10.1371/journal.pbio.3001972 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Weber CC. Disentangling cobionts and contamination in long-read genomic data using sequence composition. G3 (Bethesda). 2024;14(11):jkae187. doi: 10.1093/g3journal/jkae187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Vancaester E, Blaxter ML. MarkerScan: separation and assembly of cobionts sequenced alongside target species in biodiversity genomics projects. Wellcome Open Res. 2024;9:33. doi: 10.12688/wellcomeopenres.20730.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Eaton DAR. Toytree: a minimalist tree visualization and manipulation library for Python. Methods Ecol Evol. 2019;11(1):187–91. doi: 10.1111/2041-210x.13313 [DOI] [Google Scholar]
  • 74.Mautner SI, Cook KA, Forbes MR, McCurdy DG, Dunn AM. Evidence for sex ratio distortion by a new microsporidian parasite of a Corophiid amphipod. Parasitology. 2007;134(Pt 11):1567–73. doi: 10.1017/S0031182007003034 [DOI] [PubMed] [Google Scholar]
  • 75.Zhang C, Rabiee M, Sayyari E, Mirarab S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics. 2018;19(Suppl 6):153. doi: 10.1186/s12859-018-2129-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. doi: 10.1093/bioinformatics/btv351 [DOI] [PubMed] [Google Scholar]
  • 77.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37(5):1530–4. doi: 10.1093/molbev/msaa015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117(17):9451–7. doi: 10.1073/pnas.1921046117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0.2013-2015; 2013. Available from: http://www.repeatmasker.org
  • 80.Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36(9):2896–8. doi: 10.1093/bioinformatics/btaa025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Desjardins CA, Sanscrainte ND, Goldberg JM, Heiman D, Young S, Zeng Q, et al. Contrasting host–pathogen interactions and genome evolution in two generalist and specialist microsporidian pathogens of mosquitoes. Nat Commun. 2015;6(1). doi: 10.1038/ncomms8121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11(1):1432. doi: 10.1038/s41467-020-14998-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc R Soc Lond B. 1994;255(1342):37–45. doi: 10.1098/rspb.1994.0006 [DOI] [Google Scholar]
  • 84.Dia N, Lavie L, Faye N, Méténier G, Yeramian E, Duroure C, et al. Subtelomere organization in the genome of the microsporidian Encephalitozoon cuniculi: patterns of repeated sequences and physicochemical signatures. BMC Genomics. 2016;17:34. doi: 10.1186/s12864-015-1920-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Cormier A, Wattier R, Giraud I, Teixeira M, Grandjean F, Rigaud T, et al. Draft genome sequences of Thelohania contejeani and Cucumispora dikerogammari, pathogenic microsporidia of freshwater crustaceans. Microbiol Resour Announc. 2021;10(2):e01346-20. doi: 10.1128/MRA.01346-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Wadi L, Reinke AW. Evolution of microsporidia: an extremely successful group of eukaryotic intracellular parasites. PLoS Pathog. 2020;16(2):e1008276. doi: 10.1371/journal.ppat.1008276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Bojko J, Becnel J, Bessette E, Edwards S, Gao J, Huang W-F, et al. Nosema or Vairimorpha: genomic/proteomic support to a complex socio-economic issue rooted in taxonomic change. J Invertebr Pathol. 2025;212:108376. doi: 10.1016/j.jip.2025.108376 [DOI] [PubMed] [Google Scholar]
  • 88.Bartolomé C, Higes M, Hernández RM, Chen YP, Evans JD, Huang Q. The recent revision of the genera Nosema and Vairimorpha (Microsporidia: Nosematidae) was flawed and misleads the bee scientific community. J Invertebr Pathol. 2024;206:108146. doi: 10.1016/j.jip.2024.108146 [DOI] [PubMed] [Google Scholar]
  • 89.Tokarev YS, Huang W-F, Solter LF, Malysh JM, Becnel JJ, Vossbrinck CR. A formal redefinition of the genera Nosema and Vairimorpha (Microsporidia: Nosematidae) and reassignment of species based on molecular phylogenetics. J Invertebr Pathol. 2020;169:107279. doi: 10.1016/j.jip.2019.107279 [DOI] [PubMed] [Google Scholar]
  • 90.Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66. doi: 10.1093/nar/gkf436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587–9. doi: 10.1038/nmeth.4285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9(3):90–5. doi: 10.1109/mcse.2007.55 [DOI] [Google Scholar]
  • 93.Birky CW Jr. Heterozygosity, heteromorphy, and phylogenetic trees in asexual eukaryotes. Genetics. 1996;144(1):427–37. doi: 10.1093/genetics/144.1.427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Shimodaira H. An approximately unbiased test of phylogenetic tree selection. Syst Biol. 2002;51(3):492–508. doi: 10.1080/10635150290069913 [DOI] [PubMed] [Google Scholar]
  • 95.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. doi: 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. doi: 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Shen W, Le S, Li Y, Hu F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q File manipulation. PLoS One. 2016;11(10):e0163962. doi: 10.1371/journal.pone.0163962 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2016;44(D1):D7-19. doi: 10.1093/nar/gkv1290 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Barton AB, Pekosz MR, Kurvathi RS, Kaback DB. Meiotic recombination at the ends of chromosomes in Saccharomyces cerevisiae. Genetics. 2008;179(3):1221–35. doi: 10.1534/genetics.107.083493 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Jensen-Seaman MI, Furey TS, Payseur BA, Lu Y, Roskin KM, Chen C-F, et al. Comparative recombination rates in the rat, mouse, and human genomes. Genome Res. 2004;14(4):528–38. doi: 10.1101/gr.1970304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Pointon DLB. Zenodo: sanger-tol/curationpretext; 2025. Available from: doi: 10.5281/zenodo.14983419 [DOI] [Google Scholar]
  • 103.PretextView: OpenGL Powered Pretext Contact Map Viewer. Github; Available from: https://github.com/sanger-tol/PretextView
  • 104.Chen H, Zwaenepoel A, Van de Peer Y. wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication. Bioinformatics. 2024;40(5):btae272. doi: 10.1093/bioinformatics/btae272 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, et al. Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci U S A. 2005;102(15):5454–9. doi: 10.1073/pnas.0501102102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Van de Peer Y. Computational approaches to unveiling ancient genome duplications. Nat Rev Genet. 2004;5(10):752–63. doi: 10.1038/nrg1449 [DOI] [PubMed] [Google Scholar]
  • 107.Blanc G, Wolfe KH. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004;16(7):1667–78. doi: 10.1105/tpc.021345 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Lynch M, Conery JS. The evolutionary demography of duplicate genes. Genome Evolution. Springer Netherlands; 2003. p. 35–44. doi: 10.1007/978-94-010-0263-9_4 [DOI] [PubMed] [Google Scholar]
  • 109.Schultz DT, Haddock SHD, Bredeson JV, Green RE, Simakov O, Rokhsar DS. Ancient gene linkages support ctenophores as sister to other animals. Nature. 2023;618(7963):110–7. doi: 10.1038/s41586-023-05936-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Konstantinidis KT, Tiedje JM. Genomic insights that advance the species definition for prokaryotes. Proc Natl Acad Sci U S A. 2005;102(7):2567–72. doi: 10.1073/pnas.0409727102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.de Albuquerque NRM, Haag KL. Using average nucleotide identity (ANI) to evaluate microsporidia species boundaries based on their genetic relatedness. J Eukaryot Microbiol. 2023;70(2):e12944. doi: 10.1111/jeu.12944 [DOI] [PubMed] [Google Scholar]
  • 112.Hart R, Moran NA, Ochman H. Genomic divergence across the tree of life. Proc Natl Acad Sci U S A. 2025;122(10):e2319389122. doi: 10.1073/pnas.2319389122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Ohno S. Evolution by gene duplication; 1970. Available from: https://books.google.com/books?hl=en&lr=&id=5SjqCAAAQBAJ&oi=fnd&pg=PA1&ots=MoU4vLG0Af&sig=7ZznL61U389lqqWhblOVWnIAK-k
  • 114.Szadkowski E, Eber F, Huteau V, Lodé M, Huneau C, Belcram H, et al. The first meiosis of resynthesized Brassica napus, a genome blender. New Phytol. 2010;186(1):102–12. doi: 10.1111/j.1469-8137.2010.03182.x [DOI] [PubMed] [Google Scholar]
  • 115.Udall JA, Quijada PA, Osborn TC. Detection of chromosomal rearrangements derived from homologous recombination in four mapping populations of Brassica napus L. Genetics. 2005;169(2):967–79. doi: 10.1534/genetics.104.033209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Schranz ME, Osborn TC. De novo variation in life-history traits and responses to growth conditions of resynthesized polyploid Brassica napus (Brassicaceae). Am J Bot. 2004;91(2):174–83. doi: 10.3732/ajb.91.2.174 [DOI] [PubMed] [Google Scholar]
  • 117.Osborn TC, Butrulle DV, Sharpe AG, Pickering KJ, Parkin IAP, Parker JS, et al. Detection and effects of a homeologous reciprocal transposition in Brassica napus. Genetics. 2003;165(3):1569–77. doi: 10.1093/genetics/165.3.1569 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Song K, Lu P, Tang K, Osborn TC. Rapid genome change in synthetic polyploids of Brassica and its implications for polyploid evolution. Proc Natl Acad Sci U S A. 1995;92(17):7719–23. doi: 10.1073/pnas.92.17.7719 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Weiss H, Maluszynska J. Chromosomal rearrangement in autotetraploid plants of Arabidopsis thaliana. Hereditas. 2000;133(3):255–61. doi: 10.1111/j.1601-5223.2000.00255.x [DOI] [PubMed] [Google Scholar]
  • 120.Nakao Y, Kanamori T, Itoh T, Kodama Y, Rainieri S, Nakamura N, et al. Genome sequence of the lager brewing yeast, an interspecies hybrid. DNA Res. 2009;16(2):115–29. doi: 10.1093/dnares/dsp003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Usher J, Bond U. Recombination between homoeologous chromosomes of lager yeasts leads to loss of function of the hybrid GPH1 gene. Appl Environ Microbiol. 2009;75(13):4573–9. doi: 10.1128/AEM.00351-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Dunn B, Sherlock G. Reconstruction of the genome origins and evolution of the hybrid lager yeast Saccharomyces pastorianus. Genome Res. 2008;18(10):1610–23. doi: 10.1101/gr.076075.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Shin H, Park JE, Park HR, Choi WL, Yu SH, Koh W, et al. Admixture of divergent genomes facilitates hybridization across species in the family Brassicaceae. New Phytol. 2022;235(2):743–58. doi: 10.1111/nph.18155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.McClintock B. The significance of responses of the genome to challenge. Science. 1984;226(4676):792–801. doi: 10.1126/science.15739260 [DOI] [PubMed] [Google Scholar]
  • 125.Storchová Z, Breneman A, Cande J, Dunn J, Burbank K, O’Toole E, et al. Genome-wide genetic analysis of polyploidy in yeast. Nature. 2006;443(7111):541–7. doi: 10.1038/nature05178 [DOI] [PubMed] [Google Scholar]
  • 126.Duan H, Jones AW, Hewitt T, Mackenzie A, Hu Y, Sharp A, et al. Physical separation of haplotypes in dikaryons allows benchmarking of phasing accuracy in Nanopore and HiFi assemblies with Hi-C data. Genome Biol. 2022;23(1):84. doi: 10.1186/s13059-022-02658-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Andreadis TG. Amblyospora connecticus sp. nov. (Microsporida: Amblyosporidae): horizontal transmission studies in the mosquito Aedes cantator and formal description. J Invertebr Pathol. 1988;52(1):90–101. doi: 10.1016/0022-2011(88)90107-3 [DOI] [Google Scholar]
  • 128.Murareanu BM, Sukhdeo R, Qu R, Jiang J, Reinke AW. Generation of a microsporidia species attribute database and analysis of the extensive ecological and phenotypic diversity of microsporidia. mBio. 2021;12(3):e0149021. doi: 10.1128/mBio.01490-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Cali A, Takvorian PM. Developmental morphology and life cycles of the microsporidia. Microsporidia: pathogens of opportunity; 2014. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118395264.ch2
  • 130.Baker MD, Vossbrinck CR, Didier ES, Maddox JV, Shadduck JA. Small subunit ribosomal DNA phylogeny of various microsporidia with emphasis on AIDS related forms. J Eukaryot Microbiol. 1995;42(5):564–70. doi: 10.1111/j.1550-7408.1995.tb05906.x [DOI] [PubMed] [Google Scholar]
  • 131.Zhu X, Wittner M, Tanowitz HB, Kotler D, Cali A, Weiss LM. Small subunit rRNA sequence of Enterocytozoon bieneusi and its potential diagnostic role with use of the polymerase chain reaction. J Infect Dis. 1993;168(6):1570–5. doi: 10.1093/infdis/168.6.1570 [DOI] [PubMed] [Google Scholar]
  • 132.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2):170–5. doi: 10.1038/s41592-020-01056-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Challis R, Richards E, Rajan J, Cochrane G, Blaxter M. BlobToolKit—interactive quality assessment of genome assemblies. G3 (Bethesda). 2020;10(4):1361–74. doi: 10.1534/g3.119.400908 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764–70. doi: 10.1093/bioinformatics/btr011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Vasimuddin Md, Misra S, Li H, Aluru S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS); 2019. doi: 10.1109/ipdps.2019.00041 [DOI] [Google Scholar]
  • 136.Dudchenko O, Shamim MS, Batra SS, Durand NC, Musial NT, Mostofa R, et al. The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. Cold Spring Harbor Laboratory; 2018. doi: 10.1101/254797 [DOI] [Google Scholar]
  • 137.PretextMap: Paired REad TEXTure Mapper. Converts SAM formatted read pairs into genome contact maps. Github. Available from: https://github.com/sanger-tol/PretextMap
  • 138.Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 2021;3(1):lqaa108. doi: 10.1093/nargab/lqaa108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.UniProt Consortium. UniProt: the universal protein knowledgebase in 2025. Nucleic Acids Res. 2025;53(D1):D609–17. doi: 10.1093/nar/gkae1010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005;33(20):6494–506. doi: 10.1093/nar/gki937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Mackintosh A, de la Rosa PMG, Martin SH, Lohse K, Laetsch DR. Inferring inter-chromosomal rearrangements and ancestral linkage groups from synteny. Cold Spring Harbor Laboratory; 2023. doi: 10.1101/2023.09.17.558111 [DOI] [Google Scholar]
  • 142.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30. doi: 10.5555/1953048.2078195 [DOI] [Google Scholar]
  • 143.Maaten L, Hinton GE. Visualizing Data using t-SNE. Journal of Machine Learning Research. 2008;9: 2579–2605. Available: https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf [Google Scholar]

Decision Letter 0

Roland Roberts

20 May 2025

Dear Dr Khalaf,

Thank you for submitting your manuscript entitled "Forty New Genomes Shed Light on Sexual Reproduction and the Origin of Tetraploidy in Microsporidia" for consideration as a Research Article by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by May 22 2025 11:59PM.

If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Roli Roberts

Roland Roberts, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

Decision Letter 1

Roland Roberts

25 Jun 2025

Dear Dr Khalaf,

Thank you for your patience while your manuscript "Forty New Genomes Shed Light on Sexual Reproduction and the Origin of Tetraploidy in Microsporidia" was peer-reviewed at PLOS Biology. It has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by three independent reviewers.

You'll see that reviewer #1 is positive about the study, but thinks that the manuscript needs significant streamlining, especially for our broader readership. S/he also wants better treatment of the prior literature around sex in microsporidia, including population genetic support for this claim. There are also some presentational (to include a table of the genomes’ properties) and methodological (to justify the use of prokaryotic annotation pipeline) requests. Reviewer #2 is also positive, but wants improved analysis and presentation of the phylogeny, expressed scepticism about the number of genes in some species, and has multiple queries about the methods (including how you excluded the co-occurrence of multiple microsporidia in a single host). Reviewer #3 starts very positive, but then becomes more critical, raising concerns about inferring ploidy without cytological data, and the overall lack of methodological clarity. Like reviewer #1, s/he also thinks that the paper needs streamlining.

In light of the reviews, which you will find at the end of this email, we would like to invite you to revise the work to thoroughly address the reviewers' reports.

Given the extent of revision needed, we cannot make a decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is likely to be sent for further evaluation by all or a subset of the reviewers.

In addition to these revisions, you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly.

We expect to receive your revised manuscript within 3 months. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may withdraw it.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roli Roberts

Roland Roberts, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

The paper by Khalaf and colleagues provides a comprehensive analysis of 40 microsporidian genomes derived from long-read sequencing datasets of infected animal hosts. The study is robust, with results that are generally well-supported by the provided data and analyses. Below, I offer suggestions to improve the paper's structure, readability, and scientific rigor, along with specific technical comments.

Note that the absence of line numbers and the placement of figure legends (separated from figures and embedded mid-text) made the review process unnecessarily challenging, and I recommend addressing this for clarity.

General Comments:

* Streamline Content for Conciseness and Broader Appeal: While the dataset is extensive, the paper could be significantly shortened by consolidating analyses addressing similar questions. For instance, the phylogenetic and species definition analyses could be combined into a single, concise paragraph, as they are closely related. Similarly, analyses of inter-species versus intra-haplotype rearrangement/recombination both explore genomic diversity in microsporidia and should be merged into one section. Reducing overly detailed discussions of niche interest (e.g., specific phylogenetic nuances) would enhance accessibility for the broader readership of PLoS Biology.

* Clarify "Compartments" Terminology: The use of "compartments" in the context of eukaryotic genomes typically refers to A/B compartments—genomic regions with distinct gene/repeat enrichments and conformational differences tied to the cell cycle. The Hi-C data presented suggests compartmentalization, but it's unclear whether these reflect standard A/B compartments, as seen in other eukaryotes, including symbiotic or pathogenic fungi. The authors should investigate and explicitly compare their findings to known A/B compartmentalization to avoid confusion.

* Acknowledge Prior Hypotheses on Microsporidian Sex and Ploidy: The paper's major claims, such as the proposed sexual reproduction model and diploid/tetraploid cycles, build on earlier hypotheses that are not cited. For example, Pelin et al. (2016) and Corradi (2015) discuss diploid-tetraploid cycles and potential karyogamy in microsporidia, suggesting mechanisms for genetic diversity. These seminal works must be referenced to contextualize the current findings. The authors should clarify whether their data supports, refines, or challenges these hypotheses. Additionally, claims of sexual reproduction should be substantiated with population genetics analyses, which are absent here but have been explored elsewhere (e.g., in studies not cited in this manuscript).

Specific Comments:

* Page 7, Compartmentalization: The statement, "We show that one tetraploid genome is organised into two compartments, likely the nuclei of the diplokaryon," requires clarification. Do these Hi-C-defined compartments show differential enrichment in transposable elements (TEs), BUSCO genes, or other features, as expected in A/B compartments? The authors should compare their findings to known eukaryotic compartmentalization to ensure accurate terminology.

* Page 9, Genome Data Accessibility: The paper lacks a clear summary of the 40 genomes' characteristics. I recommend including a table detailing genome statistics (e.g., size, scaffold number, N50), sequencing methods (e.g., Hi-C, telomere resolution), and quality metrics. Additionally, explain why Hi-C was applied to some genomes but not others and outline the sequencing-to-scaffolding pipeline to provide context for methodological choices.

* Page 11, Figure and Legend Placement: Embedding figure legends mid-text, with figures placed at the end, hinders readability. Figures and their legends should be integrated within the main text at relevant points to facilitate review and comprehension.

* Page 12, Annotation Pipeline: The bacterial annotation pipeline is surprising given microsporidia's eukaryotic nature. Although introns are rare and small, a eukaryotic annotation pipeline is standard and should be employed. Justify the use of a bacterial pipeline or revise the methodology.

* Page 12, Transposable Elements and rRNA: The paper notes correlations between retroelement, DNA transposon, and helitron loads with genome size but lacks detail on their genomic distribution. Are TEs or rRNA operons enriched in specific chromosomal regions, such as subtelomeres, as observed in Encephalitozoon? This should be explored and discussed.

* Page 13, Species Definitions: Defining species based on branch length is problematic, as it risks oversimplification (e.g., analogous to grouping marsupials and mammals as conspecific). Species delimitation in microsporidia should incorporate population genetics approaches to assess genetic exchange, as asexual lineages may not align with traditional species concepts. The authors should acknowledge these limitations and consider integrating population genetics data to support taxonomic claims.

* Page 15, Nomenclature: Designations like "idChiSpeb1.μ" are opaque and confusing. If a species name exists (e.g., for the sample from Chironomus sp.), it should be used consistently to improve clarity.

Reviewer #2:

Microsporidia are a large group of intracellular parasites that infect many species of animals. Their genomes have been of interest as they have the smallest known eukaryotic genomes, but their genomic structure has not been thoroughly investigated due to only have a few species with chromosomal level assemblies. Here the authors take advantage of genome reads from serendipitously infected insects to assemble 40 genomes, representing 17 species. The authors use these genome assemblies to investigate ploidy, genome structure, and provide evidence for sexual reproduction. Overall, this is an interesting paper and the high quality microsporidia genome assemblies will be a useful resource.

Major points:

1. The authors' claim "They allowed us to revise microsporidian phylogeny, notably changing the position of Glugeida to the sister group of the ancestor of the 'orphan lineage and Amblyosporida" is unsupported by the provided data. There needs to be support values provided for the nodes of the tree in figure 2 and ideally more than one phylogenetic approach would be used. The authors use terms like "clustering robustly" and "We confidently placed" , but it is unclear what these descriptions are supported by. The authors choice to use metchnikoviids as the root is not explained, as most groups show that Mitosporidium daphnae is sister to the metchnikoviids and microsporidia. The nodes are also very difficult to see as they are compressed. I would recommend creating a separate version of this tree for the main figure that is just species and not strains. The full tree can then be a supplemental figure.

2. The number of genes predicted which are shown in figure 2 seem unreasonably high. Some species of microsporidia are reported to have less than 2000 protein coding genes, so the reporting at least 10 species having over 20,000 proteins is unexpected. What is the explanation for this? Is this different variants for the same gene being predicted multiple times? Or is this just very short proteins? Other methods besides Prokka that have previously been used on microsporidia genomes should be investigated (see the following paper for examples: DOI: 10.1111/jeu.13038.) The authors state that "no functional annotations were required in this piece of work", but gene prediction affects analysis such as figure 4a and figure 8 which some species have 100 gene pairs vs 1000 gene pairs. Also for these genomes to be useful resource, the annotations along with the genome assemblies should be deposited in a central repository such as NCBI.

3. There are very high levels of duplicated buscos (up to close to 100%) in some assemblies. Is this just because some of the assemblies are haploid resolved? This needs to be explained.

4. How the paper deals with purged genomes is confusing. A better explanation of how and why genomes were purged would be helpful to add.

5. The authors do not appear to take any steps to confirm that the microsporidia reads from their samples come from a single species. Co-occurrence of different microsporidia species in mosquitos has been shown to be ~10% (https://doi.org/10.1111/1755-0998.13205). For the non-chromosomal level genomes, some criteria needs to be used to determine whether the microsporidia reads in each sample are from the same species.

6. What criteria is used to classify ploidy in an assembly needs to be defined. For example, why is the ploidy of iyOphElle1.µ, not determined, as this is a chromosome assembly?

7. The number of genome assemblies that were screened using blobtools for microsporidia reads is not mentioned and should be.

Minor points:

1. I would recommend against using µ as to name the novel microsporidia species genomes. There will certainly be instances where others will use these genomes in applications where Greek letters are not able to be used, and this could lead to confusion. This is a known issue in biology nomenclature and see the following paper for this recommendation in the context of gene names: DOI: 10.1006/geno.2002.6748 . I would suggest using some other abbreviation such "m", "micro", or "ms" that makes the same point using the Latin alphabet.

2. Methods are lacking for the primers and genes that were used to identify individuals as microsporidia positive.

3. In figure 1A "number of microsporidia genomes" would be more appropriate as the X-axis label and moved underneath the X-axis.

4. Several acronyms and abbreviations are used that are not defined such as MFP, Asm., and OUT.

5. The paper often uses a lot of jargon that makes the paper less accessible, such as "Operational taxonomic unit classification of species suggests autotetraploidy in Microsporidia". It would be helpful to revise the paper with an eye towards clarity and simplicity that would help non-specialists be able to understand.

6. The scale in figure 3 is too compressed see the branch lengths of .15 that the authors point as delineating species. It would be helpful to use a different scale that allows this level of differences to be visualized.

7. What is being a considered an OTU in this paper not explained.

8. The text in figure 4A is too small to read and needs to be increased to at least 8 pt.

9. In figure 5 the clade names at the bottom are upside down and should be flipped so they can be more easily read.

10. A scale needs to be shown for figure 7.

Reviewer #3:

Khalaf et al. report on 40 new Microsporidia genomes that were sequenced incidentally as part of the Darwin Tree of Life project. The authors generated rather intact, complete genomes of these parasites and used Hi-C to assess contacts between homologous chromosomes in a genome. The authors use the data to address a large number of questions in Microsporidia biology, such as phylogeny, synteny, and most interestingly the evolution of tetraploidy. The authors show that tetraploids are autopolyploids rather than allopolyploids. A novel hypothesis that the two nuclei inside diplokaryotic cells are actually in some cases tetraploids divided into diploid nuclear compartments is proposed. Overall, this study provides new insights into the biology of these fascinating parasites and is easy to read.

However, I have a few major criticisms that dampen my enthusiasm. Firstly, this paper really rests on really accurate estimation of ploidy, which is tricky without cytology. For example, it is impossible to distinguish a haploid from a homozygous diploid or tetraploid. The presentation of how ploidy was assigned to these genomes was not sufficient to evaluate how robust the assignment was. Some of this comes up in Figure 9, where there is a mixed signal which the authors ascribe to segmental duplication. To be clear, I don't think the authors did a poor job in their assignments, I just think they need to be clearer about how they were done and whether there was any ambiguity. A second issue relates to the Hi-C data. Here, the authors could do a better job in walking us through what is happening. The figure itself is not self explanatory, and while their hypothesis is interesting, it's not clear how well it is supported by the data. How much uncertainty is there in this diplokaryotic model? Third, the authors run through many different analyses that are interesting, but again require really solid ploidy estimation and phasing/assembly of data into homeologues, such as the bizarre finding of an uneven and rearranged tetraploid.

I have few other general comments, but again the paper is pretty cleanly written.

1. The authors should probably streamline some of the Results/presentation. For example Figure 1B is not a big result and there is not much they can say about it. Is it worth even mentioning?

2. A new way of defining species is presented using amino acid substitutions per site. This seems more complicated that the recent analysis by Albuquerque and Haag, doi: 10.1111/jeu.12944 using ANI. I'm not sure the field needs a more complicated method that involves more variation in molecular rates.

3. Figure 4 only uses tetraploids. However, if allopolyploidy occurred between two species that were diploid but homozygous, it could look the same as a diploid.

4. Without evidence of deep coalescence between homeologues, is the section "No evidence of recent rediploidisation in Microsporidia" likely to find anything?

5. Figure 12 could use a little polishing as it shows meiosis yielding two products and gametes undergoing mitosis. These simplifications could end up confusing readers.

Decision Letter 2

Roland Roberts

29 Sep 2025

Dear Dr Khalaf,

Thank you for your patience while we considered your revised manuscript "Forty New Genomes Shed Light on Sexual Reproduction and the Origin of Tetraploidy in Microsporidia" for publication as a Research Article at PLOS Biology. This revised version of your manuscript has been evaluated by the PLOS Biology editors, the Academic Editor, and the original reviewers.

Based on the reviews, we are likely to accept this manuscript for publication, provided you satisfactorily address the remaining points raised by the reviewers and the following data and other policy-related requests.

IMPORTANT - please attend to the following:

a) Please address the remaining points from the reviewers...

b) ...of which the Academic Editor said "I think that reviewer #2 has a valid point that it can be hard to discern if these are clonal vs polyclonal infections. I don't think it takes away from their major conclusions to add this as a caveat given that extent of analysis. As for the protein predictions, I think you can ask that they also address this and ensure that the methods they have used are sufficiently documented that this can be repeated by others working in the field." I've included that in case it's helpful.

c) Please address my Data Policy requests below; specifically, we need you to supply the numerical values underlying Figs 1, 2ABCDE, 3AB, 4, 5, 6A, 7, 8ABC, 9AB, and all of the Supp Figs, either as a supplementary data file or as a permanent DOI’d deposition. I note that you already have an associated Zenodo deposition (https://doi.org/10.5281/zenodo.15364388). Please could you confirm whether the data and code in this deposition are sufficient to recreate the Figures (main and supplementary)?

d) Please cite the location of the data clearly in all relevant main and supplementary Figure legends, e.g. “The data underlying this Figure can be found in S1 Data” or “The data underlying this Figure can be found in https://zenodo.org/records/15364388”

e) I note that (unlike in earlier versions) the supplementary Figs and Tables are currently only accessible as part of very large zipped folders – please submit these as separate files, and include their legends in the manuscript file. It’s fine for the other files (non-Fig, non-Table) to simply be provided in the Zenodo deposition.

f) Please make any custom code available, either as a supplementary file or as part of your Zenodo deposition.

g) Please include the URLs of your funders in the Financial Disclosure statement.

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

In addition to these revisions, you may need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly. If you do not receive a separate email within a few days, please assume that checks have been completed, and no additional changes are required.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://plos.org/published-peer-review-history/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Roli Roberts

Roland Roberts, PhD

Senior Editor

rroberts@plos.org

PLOS Biology

------------------------------------------------------------------------

DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: Figs 1, 2ABCDE, 3AB, 4, 5, 6A, 7, 8ABC, 9AB, and all of the Supp Figs. NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

IMPORTANT: Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

------------------------------------------------------------------------

CODE POLICY

Per journal policy, if you have generated any custom code during the course of this investigation, please make it available without restrictions. Please ensure that the code is sufficiently well documented and reusable, and that your Data Statement in the Editorial Manager submission system accurately describes where your code can be found.

Please note that we cannot accept sole deposition of code in GitHub, as this could be changed after publication. However, you can archive this version of your publicly available GitHub code to Zenodo. Once you do this, it will generate a DOI number, which you will need to provide in the Data Accessibility Statement (you are welcome to also provide the GitHub access information). See the process for doing this here: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content

------------------------------------------------------------------------

DATA NOT SHOWN?

- Please note that per journal policy, we do not allow the mention of "data not shown", "personal communication", "manuscript in preparation" or other references to data that is not publicly available or contained within this manuscript. Please either remove mention of these data or provide figures presenting the results and the data underlying the figure(s).

------------------------------------------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

No comments.

Reviewer #2:

Although the authors do address some of my concerns there are some that they don't. My biggest remaining concern is the lack of protein predictions and annotations. Although the authors don't use genome wide protein predictions them in their analysis, a large value of a paper that assembles novel genomes, is that the genomes are used as a resource, and the prediction and annotation of proteins is a large part of that. I also don't understand the data on the protein prediction numbers the authors provide. It seems like AUGUSTUS for many of the genomes is predicting 0 proteins. Presumably for their BUSCO analysis they are relying on AUGUSTUS predictions, so its unclear to me why their predictions are so low. My other remaining concerns are listed below.

1.The authors claim that the genomes are clonal infections but they still don't provide convincing data to support that. Their marker scan approach is relying on 1000 bp long sequences (this is from the cited paper, the authors should include this information in their methods) and if only 10 percent of the genome might be sequenced, this might result in no ribosomal sequence or only a fragmented ribosomal sequence. Their genome scan results also likely wouldn't be able to detect if only 10 percent of the genome was from a different species. Their blob tools analysis could potentially deal with a partial cooccurring genome. The authors state: " A mixed infection of multiple microsporidians would display a distinct pattern in the BlobToolKit plots - especially given that the two (or more) microsporidians would likely differ in read depth (because of different levels of infection)," Here are a couple of examples of non-chromosomal genomes and the range of coverages that they see. Do the authors have evidence that a co-infection that is at 10% of the main infection would not have any contigs included within the low range of coverage?

iyAmbProt1.µ 2.219 - 41.448

ihCicViri2.µ 21.3 - 345.6

Although these genomes may indeed be from single clonal infections, I do not believe the authors have provided evidence to support their new claim "we determined that all of our forty microsporidian genomes are derived from single, clonal infections.""

I do think this is a hard problem and the authors do not necessarily have to solve it, but If the authors are not going to convincingly address this, adding a sentence to the discussion or methods about the limitation of analysing microsporidia genomes from metagenomic data would be helpful.

2. The authors are still rooting their tree based on Metchnikovella, without an explanation why. This doesn't effect their results, but displaying the tree this way will be confusing as Mitosporidium is considered the outgroup to metchnikovella and the microsporidia.

Reviewer #3:

I have evaluated the revision, and I think the authors have done a decent job of addressing the previous comments. I note that the responses are somewhat hard to follow because the line numbers do not jive with the rebuttal.

I think the authors should keep in mind that actually it is very easy to have a situation where you have a completely homozygous diploid. There are many eukaryotes where haploid selfing occurs and generates completely homozygous diploids. They do acknowledge that a completely homozygous diploid is impossible to differentiate from a completely homozygous tetraploid or haploid. I also stick behind my statement about ANI, but we can disagree on this opinion and this doesn't influence the results in this paper.

Congratulations to the authors.

Decision Letter 3

Roland Roberts

3 Oct 2025

Dear Dr Khalaf,

Thank you for the submission of your revised Research Article "Forty New Genomes Shed Light on Sexual Reproduction and the Origin of Tetraploidy in Microsporidia" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, Joseph Heitman, I'm pleased to say that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely,

Roli Roberts

Roland G Roberts, PhD, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Microsporidian genome assembly statistics and host metadata.

    Full list of recovered microsporidian genome assemblies, their associated metadata, and host metadata.

    (XLSX)

    pbio.3003446.s001.xlsx (40.1KB, xlsx)
    S2 Table. Filtering parameters used in generating genome assemblies.

    Parameters used for filtering microsporidian contigs from their respective (meta-)genomic assemblies in filtering steps 1 (BlobToolKit [133]) and 2 (BubblePlot, Github: https://github.com/Amjad-Khalaf/BubblePlot). See Materials and methods for details.

    (XLSX)

    pbio.3003446.s002.xlsx (7.4KB, xlsx)
    S3 Table. Trait-phylogeny regression.

    Transformations representing the fit with the tree’s topology (λ), branch-lengths (κ), and root–tip distance (δ) [83] and the number of coding sequences, transposable element loads, and genome spans.

    (XLSX)

    pbio.3003446.s003.xlsx (7.8KB, xlsx)
    S4 Table. Trait correlation.

    Correlations between transposable element loads and genome spans.

    (XLSX)

    pbio.3003446.s004.xlsx (7.9KB, xlsx)
    S5 Table. Accession numbers for publicly available genomes used in this study.

    On the 1st of January 2025, we downloaded all microsporidian genome assemblies available in the NCBI Genome database. This retrieved the following 106 genome assemblies.

    (XLSX)

    pbio.3003446.s005.xlsx (6.7KB, xlsx)
    S6 Table. Branch length distances for species delineation.

    Pairwise branch length distances which include one of our genomes, and can be classified to a species or a genus. The conservative branch length threshold range was defined using the shortest observed branch lengths between known same-species genomes for the lower bound (0) and the smallest distance between H. tvaerminnensis and H. magnivora genomes for the upper bound (0.012). The relaxed threshold uses the full range of observed branch lengths among known same-species genomes (excluding the H. tvaerminnensisH. magnivora cutoff).

    (XLSX)

    pbio.3003446.s006.xlsx (16.8KB, xlsx)
    S1 Text. Newick string of phylogeny.

    ASTRAL [75] phylogeny summarizing individual phylogenies of 600 BUSCO genes (microsporidia_odb10) [76] across all publicly available microsporidian genome assemblies (n = 106), and the genome assemblies generated in this study (n = 40, marked in purple). Branch lengths were estimated with IQ-TREE using a concatenated alignment of the individual BUSCOs [77]. The model chosen according to IQ-TREE’s model finder was “Q.yeast.I.G4”.

    (TXT)

    pbio.3003446.s007.txt (8.4KB, txt)
    S2 Text. Approximately Unbiased phylogenetic test results.

    BUSCO (microsporidia_odb10, version 5.4.6) (Simão and colleagues 2015) was run on the unpurged genome assemblies of the tetraploid genomes. The haplotypes of each BUSCO locus were aligned to one another using MAFFT (version 7.525) (Katoh and colleagues 2002), and a phylogeny was generated for each alignment using IQ-TREE (version 2.3.4, with ModelFinder enabled and 1,000 bootstrap replicates) (Minh and colleagues 2020; Kalyaanamoorthy and colleagues 2017). The Approximately Unbiased statistical test [90] on the multi-copy BUSCO gene phylogenies for all pairwise combinations of tetraploid microsporidian genomes. The high-level summary of these pairwise tests are included in this text. For each listed pairwise comparison, the “+” sign indicates the number of phylogenies where haplotypes coalesce more recently than species, and the “−” sign indicates the number of phylogenies where species coalesce more recently than haplotypes.

    (TXT)

    pbio.3003446.s008.txt (33.8KB, txt)
    S3 Text. Details on rearrangements inferred and methods attempted.

    (TXT)

    pbio.3003446.s009.txt (2.1KB, txt)
    S1 Script. Plotting histograms depicting phylogenetic branch lengths (in amino acid substitutions per site) between homeologous gene pairs for 13 tetraploid genomes.

    Python script used to extract pairwise branch lengths between homeologous gene pairs for 13 tetraploid genomes, and plot them as histograms. Please note that the following genomes are represented by deprecated ToLIDs, which differ from the ones used in this manuscript. iyOecSmar33: idDelPlat3; iyOecSmar35: idTanUsma1; iyOecSmar39: idChiSpeb1; iyOecSmar41: idDelPlat4; and iyOecSmar44: idDelPlat5.

    (PY)

    pbio.3003446.s010.py (3.3KB, py)
    S1 Fig. Sex of hosts the microsporidian genome assemblies are derived from.

    The sex of our genomes’ hosts was unknown in most cases (24 species). In the remaining cases, nine were identified as female and seven as male. A relatively equal proportion of female and male hosts are infected with Nosematida (Fig 1), but we could not assess skews in host sex ratios for other microsporidian groups due to missing data on sex (for Amblyosporida-infected hosts), or a small sample size (for Neopereziida-infected hosts). The data underlying this figure can be found in S1 Table. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s011.png (14.9MB, png)
    S2 Fig. Ploidy inference examples for three microsporidian genomes, highlighting segmental duplications.

    GenomeScope2 transformed linear plot and Smudgeplot [82], respectively, for (A), (B) diploid iyCepSpine2.µ (host Cephus spinipes [Hymenoptera]); (C), (D) diploid idPhaFune2.µ (host Phania funesta [Diptera]); and (E), (F) polyploid (tetraploid or octoploid) iiMysAzur1.µ (host Mystacides azureus [Trichoptera]). Jellyfish was used to generate the initial k-mer spectra (k = 21, version 2.2.10) [134]. Both iyCepSpin2.µ and idPhaFune2.µ have mostly diploid genomes, but carry a level of duplication that generated an identifiable “tetraploid” signal in their k-mer spectra. Similarly, the k-mer spectrum of iiMysAzur1.µ can be interpreted as either a highly homozygous tetraploid where large segmental duplications have occurred in all the four copies leading to a detectable octoploid signal, or an octoploid genome composed of two distinct tetraploids. Such cases are common, with some level of segmental duplication observed in nearly all of the 14 polyploid genomes (refer to File Collection 1 at https://doi.org/10.5281/zenodo.17251512). The genomes used to generate this figure can be found in File Collection 12 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using GenomeScope2 [82], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s012.png (3.5MB, png)
    S3 Fig. 600 Gene Phylogeny of Microsporidia.

    (A) ASTRAL [75] phylogeny summarizing individual phylogenies of 600 BUSCO genes (microsporidia_odb10) [76] across all publicly available microsporidian genome assemblies (including multiple strains where they are available), and the genome assemblies generated in this study (n = 40, marked in purple). Branch lengths were estimated with IQ-TREE using a concatenated alignment of the individual BUSCOs [77]. Nodes with less than 95% support are marked with pink circles. Ploidy is marked in circles at the tips of the tree for genomes where it was characterizable. (B) Genome assembly span (Mb) as calculated by assembly-stats (Github: https://github.com/sanger-pathogens/assembly-stats), with black circles marking chromosome-level genome assemblies. (C) N50 values (Mb) as calculated by assembly-stats (Github: https://github.com/sanger-pathogens/assembly-stats), with asterisks marking purged genome assemblies. (D) BUSCO gene (microsporidia_odb10) completeness percentage, marked in green for single-copy genes, and beige for duplicated genes. (E) Transposable element percentage as predicted by RepeatModeler and RepeatMasker [78,79], marked in burgundy for retroelements, peach for DNA transposons, and blue for rolling circles. Neop.: Neopereziida; Or. Lin.: Orphan Lineage. The data underlying A can be found in S1 Text. The data underlying B, C, D, and E can be found in S1 Table. The figure was generated using ToyTree [73], and manually annotated using InkScape (version 1.2.2).

    (ZIP)

    pbio.3003446.s013.zip (5.8MB, zip)
    S4 Fig. Comparison of whole-genome phylogeny species delineation thresholds and individual gene phylogeny branch length distribution species delineation thresholds.

    The approach we presented in the main text relies on branch lengths derived from the whole-genome phylogeny in Fig 2 (i.e., a concatenated supermatrix of genes). We re-estimated same-species branch length thresholds for each gene. For each gene, we used the distribution of branch lengths between genomes known to belong to the same species, and measured each distribution’s mean and 95th percentile. The upper threshold was then set by retrieving the highest observed 95th percentile (orange dashed line) and the highest observed mean (magenta dashed line). While the percentage of genes exceeding each threshold varies for each genome, they are relatively consistent, and lead to the same OTU assignment and the same conclusions when investigating tetraploid species. ilAceEphe1.µ still stands out as possessing more genes which exceed the same-species threshold (no matter what threshold was used) than other genomes. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s014.png (215.6KB, png)
    S5 Fig. Relationship between whole-genome phylogeny species delineation thresholds and individual gene phylogeny branch length distribution species delineation thresholds.

    We compared our two gene-based metrics (highest 95th percentile and highest mean of branch length distributions of individual gene trees for genomes known to belong to the same species) to the whole-genome-based metric (highest branch length observed between any two same species genomes). We found the relationship between them to be consistent and linear, in line with the fact that they lead to the same conclusions. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s015.png (118.2KB, png)
    S6 Fig. Tetraploid ilAceEphe1.µ is uneven and rearranged.

    The number of BUSCO genes found in X haplotypes, along with their total copy number. idChiSpeb1.µ is an even tetraploid, so nearly all its BUSCO genes are in 4 copies, distributed across 4 haplotypes. On the other hand, ilAceEphe1.µ is an uneven tetraploid. The majority of its BUSCO genes are in less than 4 copies, and they are not evenly distributed across its haplotypes. For instance, some BUSCO genes occur in 3 copies present only in a single haplotype. The figure was generated using gerbil (Github: https://github.com/Amjad-Khalaf/gerbil), and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s016.png (164KB, png)
    S7 Fig. Phylogeny used by Syngraph, with its internal node labeling.

    Each node is labeled with its Syngraph name in a gray box. Yellow boxes indicate the number of chromosomes each genome possesses, and blue boxes indicate the number of chromosomes which possess BUSCO gene markers. The figure was generated using ToyTree [73], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s017.png (380.2KB, png)
    S8 Fig. Number of chromosomes inferred at each node is highly variable.

    The number of chromosomes inferred for each node, and the total number of BUSCO genes assigned to a chromosome for each “m.” “m” is the parameter in Syngraph to determine the minimum number of genes needed to travel together for the event to be counted as a rearrangement. For example, if m = 3, only rearrangements involving 3 or more genes will be counted. Deep nodes are highly variable and their karyotype (and thus the number of rearrangements that have occurred along each branch) cannot be estimated reliably. See S7 Fig for node labels on the phylogeny. The figure was generated using Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s018.png (577.5KB, png)
    S9 Fig. T-SNE plot depicting BUSCO linkage groups across the microsporidian phylogeny.

    Each point represents a BUSCO gene, positioned based on its co-occurrence profile across the chromosome-level microsporidian genomes. Distances between points reflect similarities in co-occurrence. Points are coloured by their assigned chromosome in Anotonspora locustae. This disorganized pattern illustrates that the rate of rearrangement is too high for a reliable complete reconstruction of putative ancestral linkage groups. The large-scale patterns are influenced by more densely sampled taxa, see S10 Fig. The data underlying this figure can be found in File Collection 5 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using Scikit-learn [142,143] and Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s019.png (124.8KB, png)
    S10 Fig. T-SNE plot depicting BUSCO linkage groups across the microsporidian phylogeny, highlighting clustering influence by more densely sampled taxa.

    Each point represents a BUSCO gene, positioned based on its co-occurrence profile across the chromosome-level microsporidian genomes. Distances between points reflect similarities in co-occurrence. Points are coloured by their assigned chromosome in Encephalitozoon cuniculi. This disorganized pattern illustrates that the rate of rearrangement is too high for a reliable complete reconstruction of putative ancestral linkage groups. The large-scale patterns are influenced by more densely sampled taxa, such as Encephalitozoon cuniculi. The data underlying this figure can be found in File Collection 5 at https://doi.org/10.5281/zenodo.17251512. The figure was generated using Scikit-learn [142,143] and Matplotlib [92], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s020.png (108.8KB, png)
    S11 Fig. Synteny plots of chromosomal microsporidian genome assemblies.

    Genome-wide synteny plots of all available chromosomal microsporidian genome assemblies. Each line represents a single-copy BUSCO (microsporidia_odb10) [76]. BUSCOs are painted by their chromosomal position in A. locustae. The data underlying this figure can be found in File Collection 5 at https://doi.org/10.5281/zenodo.17251512. Figure was generated by using ribbon plot scripts from https://github.com/conchoecia/odp [109] and ToyTree [73], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s021.png (27.2MB, png)
    S12 Fig. Synteny plots of chromosomal microsporidian genome assemblies.

    Genome-wide synteny plots of all available chromosomal microsporidian genome assemblies. Each line represents a single-copy BUSCO (microsporidia_odb10) [76]. BUSCOs are painted by their chromosomal position in H. tvaerminnensis. Figure was generated by using ribbon plot scripts from https://github.com/conchoecia/odp [109] and ToyTree [73], and manually annotated using InkScape (version 1.2.2).

    (PNG)

    pbio.3003446.s022.png (26.2MB, png)
    Attachment

    Submitted filename: Response_to_Reviewers.docx

    pbio.3003446.s025.docx (362.8KB, docx)
    Attachment

    Submitted filename: Response to Reviewers and Addressing Policy Concerns.pdf

    pbio.3003446.s026.pdf (151.1KB, pdf)

    Data Availability Statement

    All relevant data and Supporting information are available in full on Zenodo https://doi.org/10.5281/zenodo.17251512. Small supporting figures and tables were attached to submission as well.


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES