Abstract
Insects ally with microbial symbionts for a diversity of services. The range of these interactions is wide, spanning from beneficial to pathogenic and facultative to obligate. In many cases, such insect–microbial interactions veer towards mutual dependency with integrated physiologies. This evolutionary outcome is relatively common in insects that depend on microbes to fill gaps in their nutritional ecologies (e.g. plant-sap feeding). However, the initiation and transition towards such dependent symbiotic interactions are difficult to observe in nature. Identifying these events can provide key insights into the origins and evolutionary processes that shape symbiotic interactions. Here, we report on a novel interaction between a leafhopper (Typhlocybinae: Empoasca mexicana) and a bacterium, Symbiopectobacterium purcellii MEX strain (S-MEX). To characterize this symbiont, we assembled and annotated its complete genome. We compared its content and structure to the genomes of other Symbiopectobacterium. The S-MEX genome is unique among members of this genus. It is the largest yet sequenced at 5.3 Mb, encoding 6,838 genes (∼25% more than other strains). S-MEX's genome has significantly expanded due to the proliferation of insertion sequences and 2,723 identifiable pseudogenes—processes generally seen as accelerators of genome reduction and emerging host dependence. S-MEX and other Symbiopectobacterium strains have a core set of 818 genes shared in >90% of strains, of which S-MEX has uniquely lost 36 genes. Taken together, we hypothesize that due to expansion of IS elements, extensive pseudogenization, and loss of genes in important free-living functions, S-MEX is in the early stages of establishing a host-dependent symbiosis.
Keywords: Symbiopectobacterium purcellii, early symbiosis, IS elements, genome size, comparative genomics
Significance.
It is difficult to observe the initiation of dependent animal–bacterial symbioses in nature, and the early impact of this process on interacting genomes. We recently identified a leafhopper-associated symbiont with flexible ecological capabilities that can shed light on these processes. Using long-read genomic sequencing technology, we assembled a complete genome for this bacterial symbiont (Symbiopectobacterium purcellii MEX). We identified the hallmarks of a transitioning symbiotic relationship, including the expansion of insertion elements and the unique loss of genes involved in free-living lifestyles. These features are key theoretical predictions for the establishment of highly dependent animal–microbe symbiosis, driven by population reduction, reduced selection, and metabolic streamlining, providing a snapshot of the transition towards animal host dependence from a plant pathogenic state.
Introduction
Insect symbioses with microbes range from pathogenic to obligately beneficial (McCutcheon et al. 2019; Perreau and Moran 2022). The microbes involved in these interactions exhibit varying degrees of host dependence from circulating in the environment to obligately intracellular (Gupta and Nair 2020; Drew et al. 2021; Holt et al. 2024). In the latter case, hosts evolve specialized organs and cells to manage all microbial activities, leading to genome reduction and loss of cellular independence (Bennett and Moran 2015; Wilson and Duncan 2015; Latorre et al. 2017; McCutcheon et al. 2019). This evolutionary outcome has been well-documented, particularly among insects in the order Hemiptera (e.g. cicadas, aphids, whiteflies, etc.) that depend on microbes for nutrition lacking in their specialized diets (Bennett 2020; Kaltenpoth et al. 2025). Yet, despite accumulating examples of such symbioses, our understanding of how these interactions initiate and progress towards dependency remains limited.
The evolutionary pathway to becoming a dependent symbiont is likely gradual and involves several predictable evolutionary steps (McCutcheon et al. 2019; Siozios et al. 2024). At the far end of this journey, ancient endosymbionts have distinctively streamlined genomes, strict vertical transmission, and a restriction to specific host tissues and cells (McCutcheon and Moran 2011; McCutcheon et al. 2019). In contrast, at their earliest stages, dependent symbionts retain larger functional genomes, enabling more diverse transmission modes, environmental and ecological interactions, and host and tissue associations (McCutcheon and Moran 2011; Salem et al. 2015; Fisher et al. 2017; McCutcheon et al. 2019). As host–symbiont interactions become increasingly specific, these flexible capabilities—and the genes underlying them—are characteristically lost. The initiation of this process is marked by significant genomic upheaval in the microbial symbiont, as metabolic redundancies are purged and selection weakens due to the strengthening of drift (Koga and Moran 2014; Oakeson et al. 2014). A significant and paradoxical feature of this stage is genome expansion through the proliferation of repetitive elements (e.g. bacterial insertion sequences [IS]; Koga and Moran 2014; Siozios et al. 2024). This process likely serves as a key accelerator for the eventual extreme reduction in genome size and capability of microbial genomes, as repetitive elements insert into genes and irreversibly destroy them (McCutcheon et al. 2019). Although there are a few key examples of these early processes, novel cases can provide key insights into the early evolutionary stages of dependent symbioses within and between specific host and microbial groups (Siozios et al. 2024).
To understand how early host–symbiont dependency evolves, we identified and genomically characterized a strain of the insect symbiont, Symbiopectobacterium purcellii (Gammaproteobacteria), isolated from the leafhopper Empoasca mexicana (Cicadellidae: Typhlocybinae). Empoasca mexicana, like other members of the Typhlocybinae leafhopper subfamily, is not known to harbor obligate endosymbionts as are typically found in other leafhopper and auchenorrhynchan relatives (Buchner 1965; Moran et al. 2005; Kobiałka et al. 2025). By transitioning to a more nutrient-rich parenchyma diet, Typhlocybinae lost their dependence on obligate symbionts, along with the physiological, anatomical, and genomic architecture required to support them (Buchner 1965; Vasquez et al. 2024).
Symbiopectobacterium is a widespread and diverse group of bacterial symbionts closely related to insect-vectored Dickeya and Pectobacterium soft-rot plant pathogens (Charkowski 2018). In plants, Symbiopectobacterium strains have been observed to infect potato (Solanum tuberosum; (Nunes Leite et al. 2023), buckeye trees (Aesculus californica; present study), ryegrass (Purcell et al. 1994), and likely others. In invertebrates, Symbiopectobacterium range from obligately beneficial to facultatively parasitic interactions across hosts, including nematodes, flies, leafhoppers, heteropterans, and others (Degnan et al. 2011; Martinson et al. 2020; Nadal-Jimenez et al. 2022). In at least some cases, these interactions appear to be host–symbiont specific, with Symbiopectobacterium infecting all members of a population or species (e.g. nematodes), having highly reduced genomes characteristic of coevolved host–symbiont dependencies (e.g. bulrush bugs), or reducing insect fitness when cross-infected from native to non-native hosts (e.g. leafhoppers) (Purcell et al. 1994; Martinson et al. 2020). Symbiopectobacterium have also been observed to be vertically transmitted, though some lineages retain the potential for horizontal transmission, particularly through plants (Degnan et al. 2011; Martinson et al. 2020). Due to their emerging ecological diversity, Symbiopectobacterium has been ranked alongside Sodalis, Arsenophonus, Wolbachia, Rickettsia, among others, as some of the most prevalent symbionts of insects (Martinson et al. 2020; Nadal-Jimenez et al. 2022; Wierz et al. 2024). However, much remains to be understood about this bacterium and its evolutionary diversity.
Here, our Symbiopectobacterium strain isolated from E. mexicana leafhoppers, Symbiopectobacterium MEX (S-MEX), is culturable (Hill and Purcell 1995) and can locally infect California buckeye trees (Aesculus californica; Alexander Purcell personal observation). Genomic characterization of this symbiont with long-read sequencing revealed that S-MEX's genome exhibits a telltale sign of emerging host dependency (Degnan et al. 2011; McCutcheon et al. 2019). It possesses a large genome capable of producing amino acids and other metabolites, but it is riddled with IS elements, truncated genes, and more than 2,700 pseudogenes (Fig. 1 and Fig. S1).
Fig. 1.
Phylogenetic and gene presence/absence genomic analyses for Symbiopectobacterium purcellii MEX (S-MEX) and 50 species and strains from the Symbiopectobacterium, Dickeya, and Pectobacterium group (clades color-coded pink, green, and yellow, respectively) shows strain-specific genes. The nucleotide sequences of 88 BUSCO orthologs were aligned using MUSCLE v3.8.31, and a maximum likelihood tree on the left was reconstructed with RAxML v8.2.11, partitioned by gene with the GTR+FO+G4m model and 100 bootstrap (BS) replicates. Branches have 100 BS support unless indicated. The pan-genome was obtained using Roary v3.13.0, and gene presence and absence across all species and strains were plotted with Phandango. The S-MEX strain characterized in this study is highlighted in pink. The gene presence–absence matrix on the right indicates the presence of a gene (blue/pink bar) or its absence (white spaces) in the strain.
Results and Discussion
Evolutionary Relationships of S-MEX
To determine the evolutionary associations of S-MEX to other Symbiopectobacterium and related Dickeya and Pectobacterium, we reconstructed phylogenetic relationships with 88 protein-coding orthologs (Fig. 1). We then used gene presence–absence relationships to examine functional similarities between S-MEX and these other lineages (Fig. 1). Both approaches strongly support the placement of S-MEX within Symbiopectobacterium. Phylogenetically, S-MEX is closely related to S. purcellii SyEd1T (S-SyEd1T; 16S rRNA similarity = 99.4%), isolated from a European Empoasca leafhopper (Fig. 1; Martinson et al. 2020; Nadal-Jimenez et al. 2022). However, S-MEX diverges sharply in genomic architecture and content (Fig. 1).
Genomic Characteristics of S-MEX
The S-MEX genome is assembled into five circular contigs (one chromosome and four putative plasmids) (Table 1). Its genome is 5.3 megabases (average coverage = 71×; range = 50 to 150×), encoding 6,838 protein-coding genes (PCG; table S1). Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis of S-MEX's genome indicates a complete assembly and annotation with 98.4% (n = 122) complete single-copy orthologs (Table 1).
Table 1.
Genome assembly statistics showing the contiguity and completeness metrics for assembly and annotation of the Symbiopectobacterium strain isolated from Empoasca mexicana leafhoppers, S-MEX
| Metric | Value for S-MEX | Description |
|---|---|---|
| Number of reads | 680,671 | Number of PacBio reads used for S-MEX genome assembly |
| Number of assembled contigs | 5 (all contigs are circular) | Assembly contiguity and completeness metric |
| N50 | 3,285,917 | Contiguity metric, where 50% of assembly is contained in contigs equal to or larger than this length |
| N90 | 2,060,773 | Contiguity metric, where 90% of assembly is contained in contigs equal to or larger than this length |
| L50 | 1 | Contiguity metric showing number of contigs whose length makes up 50% of the genome size |
| L90 | 2 | Contiguity metric showing number of contigs whose length makes up 90% of the genome size |
| BUSCO complete single-copy (genome and proteome) | 98.40% (n = 122) | Completeness metric, 122/124 BUSCO groups are complete and found as single-copy genes in S-MEX |
| BUSCO missing (genome and proteome) | 0.8% (n = 1) | Completeness metric, 1/124 BUSCO groups are missing in S-MEX (Ribosomal Silencing Factor RsfS) |
| BUSCO fragmented (genome and proteome) | 0.8% (n = 1) | Completeness metric, 1/124 BUSCO groups are fragmented in S-MEX (Translation Initiation Factor 3, C-terminal InfC) |
The total genome size of S-MEX is comparable to Symbiopectobacterium strains isolated from potato (NZEC127, NZEC135 and NZEC151; Nunes Leite et al. 2023). However, it is much larger than Symbiopectobacterium from animal hosts, including nematodes and bulrush bugs (Martinson et al. 2020). S-MEX has an expanded gene content compared to other strains, ranging from 244 to 4,742 PCGs (Nadal-Jimenez et al. 2022). The closely related S-SyEd1T strain has a genome that is ∼370 kb smaller and encodes 2,471 fewer predicted PCG and fewer predicted pseudogenes. Only S-NZEC135 has a similar gene count (6,956 PCGs), but its fragmented draft assembly makes direct comparisons difficult (1,588 scaffolds; Nunes Leite et al. 2023). Finally, S-MEX encodes 4,388 unique PCGs, of which most are associated with IS expansion and pseudogenes.
This S-MEX strain contains four circular, putative plasmids (Table S1). Plasmids are rarely identified in Symbiopectobacterium (Garber et al. 2021), as not all sequenced genomic projects have reported them (e.g. S-SyED1T; Martinson et al. 2020; Nadal-Jimenez et al. 2022; Nunes Leite et al. 2023). The 383 genes encoded in S-MEX's plasmids mostly lack functional predictions. Those that can be identified are involved in cell cycle control, cell division, chromosome partitioning, replication, and recombination and repair. Most of these genes are related to bacteriophage processes. Approximately 33% (n = 127) of the genes in the plasmids are also present in the chromosome, including ISs and transposases. Genes unique to the plasmids are associated with the Type II secretion system or conjugal plasmid transfer.
Finally, we note that the large differences in gene count between S-MEX and other Symbiopectobacterium lineages may arise from differences in sequencing and assembly efforts. Many Symbiopectobacterium strains have abundant highly repetitive content (e.g. IS and transposable elements; Bourque et al. 2018). We used long- and short-read sequencing to assemble a complete, high-quality chromosome and plasmids. To test whether short-reads alone are capable of assembling S-MEX's genome, we mapped the short-read sequences to its chromosome to identify contiguous (>10 bp) regions of low coverage (<30×) and compared these with annotated loci. We identified at least 734 genes that would be missed. Other Symbiopectobacterium projects that use long-read technology have similarly reported high-quality genomes (Nadal-Jimenez et al. 2022).
Dickeya–Pectobacterium–Symbiopectobacterium Have Few Shared Genes
To determine unique and shared genes between Symbiopectobacterium and Dickeya–Pectobacterium, we estimated core and pan-genomes. The pan-genome comprises 29,600 gene families across these 51 lineages (Table S2). Only 1,576 (5.32%) are part of the core genome. We note here that S-MEX is missing 298 of these core genes associated with anaerobic respiration, cell motility, and flagellar proteins.
The core genome of the Dickeya–Pectobacterium–Symbiopectobacterium group is enriched for proteins involved in transcriptional and translational processes, including peptide biosynthesis and ribosome and protein-RNA assembly (Fig. S2). It is also enriched for molecular functions such as RNA binding activity, aminoacyl-tRNA ligase activity, and ribosome structural components. These are essential genes typically retained by even the most reduced endosymbiont genomes (Moran and Bennett 2014).
The Dickeya–PectobacteriumSymbiopectobacterium accessory genome consists of 28,024 gene families, with gene ontology (GO) biological processes enriched for carbohydrate and alcohol metabolic processes, interspecies interactions, and symbiont entry into hosts. GO cellular components are enriched in genes for periplasmic space, outer cell membrane, and extracellular region.
S-MEX has Unique Genomic Capabilities and Losses
To understand how S-MEX differs from other insect-associated Symbiopectobacterium, we examined core and accessory genes for this group of eight strains (Table S2). We identified 818 core genes and a pangenome of 16,870 gene families. Genes in the core genome of the Symbiopectobacterium group predominantly have housekeeping functions (e.g. macromolecular biosynthesis, transcription and translation, and ribonucleoprotein assembly; Fig. S3).
S-MEX has a large number of species-specific genes (>50%) with 4,388 singleton genes present only in S-MEX. These genes may play roles in interspecies interactions, host interactions, symbiotic processes, and cell entry (e.g. integrases [intS] and adhesin/antigen [eaeH]; Fig. S4). These genes also include the massive expansion of prophage integrases and IS families (discussed below). Among the Symbiopectobacterium, S-MEX and SyEd1T are distinguished by retaining genes involved in amino acid and vitamin synthesis, cell envelope synthesis, and DNA repair (Fig. S1). Many of these genes are typically lost from ancient insect endosymbionts (McCutcheon and Moran 2011; Moran and Bennett 2014). Some of these genes, such as those related to amino acids, may benefit their hosts by balancing nutrient-limited diets (Hansen and Moran 2014). Further comparison of S-MEX and SyEd1T revealed marked differences between these strains, characterized by the increased loss of genes involved in the utilization of alternative carbon sources (in pathways such as D-Glucuronate and D-Galactonate biosynthesis), anaerobic metabolism (fumarate reductase) and vitamin K1 (phylloquinone) and K2 (menaquinone) biosynthesis in S-MEX, among other pathways (Fig. 2). Whereas, SyEd1T has increased gene loss in formaldehyde assimilation and methionine degradation pathways compared to S-MEX (Fig. 2). The retention of genes involved in the utilization of alternate carbon sources and anaerobic metabolism in SyEd1T suggests a broader host/ecological niche of this strain compared to S-MEX.
Fig. 2.
Comparison of KEGG pathway completeness between Symbiopectobacterium purcellii MEX (S-MEX) and S. purcellii SyEd1T shows loss of genes in alternate carbon utilization, and vitamin K biosynthesis, among other pathways in S-MEX. This indicates restricted environmental niche of S-MEX compared to SyEd1T. KEGG pathway completeness for each strain was estimated using KO term annotations from eggNOG-mapper with the kegg-pathways-completeness tool (v1.3.0). The pathways found in both strains are shown as circles with sizes corresponding to the extent of similarity in pathway completeness between the strains. Pathways found only in S-MEX or SyEd1T are listed on the right of the scatterplot with percentage of completeness denoted in parentheses.
We further identified 36 genes present in all Symbiopectobacterium, including S-SyEd1T, but absent from S-MEX. These genes include membrane proteins (yidD), ribosomal proteins (rpmJ and rpmH), transporters (gltP, fhuC, dppA, and sstT), RNA methyltransferase (cmoM), phosphorylases (deoA and deoD), and genes involved in motility (fliS and flgC flagellar proteins). Taken together, loss of transporters, ribosomal elements, and motility indicate that S-MEX is losing environmental independence and metabolic flexibility (Moran and Bennett 2014; Wilson and Duncan 2015).
S-MEX further retains some elements of canonical bacterial secretion systems, including Type I (T1SS) and Type II (T2SS) secretion systems, as well as Type III (T3SS), and Type IV (T4SS) (Fig. S1). Comparison of secretion systems between S-MEX and SyEd1T shows that S-MEX lost genes in T1SS and T2SS, the secretion systems involved in the secretion of plant cell wall-degrading enzymes in the Dickeya genus (Alič et al. 2019), indicating that S-MEX is losing the ability to degrade plant cell walls and likely the ability to infect plants. Additionally, while Type VI secretion system (T6SS) is present in SyEd1T, it is largely absent in S-MEX, whereas T4SS genes are present in S-MEX, but absent in SyEd1T (Fig. S1). While T6SS are important for plant pathogenesis and for competing in microbe-rich environments (Siozios et al. 2024), it is also possible that proteins secreted by T6SS are secreted by T4SS instead in S-MEX (Matte et al. 2024).
T3SS is the most extensive intact secretion system in both S-MEX and SyEd1T. T3SS is important for the contact-dependent transfer of proteins from bacterial symbionts and pathogens to host cells. S-MEX has 40 and SyEd1T has 39 putative T3SS genes, including Inv/Spa-like genes, SseB and SseC translocators, and SseE effector proteins (Puhar and Sansonetti 2014; Deng et al. 2017). The Dickeya–Pectobacterium strains, on the other hand, uniquely possess Hrp/Hrc genes involved in plant pathogenesis. The loss of these genes in Symbiopectobacterium indicates reduced ability to infect plants (Büttner and He, 2009). Moreover, the Inv/Spa genes retained in Symbiopectobacterium are important for infection and symbiotic relationships with insects, underscoring their transitions to this symbiotic lifestyle (Dale et al. 2002).
Finally, S-MEX's genome has experienced a dramatic expansion of IS elements. We identified 1,231 ISs, spanning 16 transposase families. The most abundant are IS5 (n = 410), IS256 (n = 280), and IS3 (n = 221) families. ISs are small transposable elements found in limited numbers in bacterial genomes, where selection mitigates the deleterious effects of their proliferation (Siguier et al. 2014). However, in relatively young, host-dependent symbionts that experience strong genetic drift, the effectiveness of selection in controlling IS proliferation is diminished (Koga and Moran 2014; Oakeson et al. 2014; Siguier et al. 2014). Extensive expansion of these elements has been observed in some relatively young bacterial symbionts, with up to a third of their PCGs comprising transposases (Siozios et al. 2024). Thus, the expansion of ISs is thought to be an early mechanism by which symbiont genomes begin their evolutionary journey towards extreme reduction (Siguier et al. 2014; McCutcheon et al. 2019).
Symbiopectobacteria Have Undergone Varying Degrees of Pseudogenization
Pseudogenes were identified in each Dickeya–Pectobacterium–Symbiopectobacterium genomes included in this study. However, Dickeya and Pectobacterium have far fewer: 2.73% and 3.32% pseudogene content, respectively. Some strains within the Symbiopectobacterium clade also exhibit relatively few pseudogenes, including strains isolated from potato (S-NZEC127 with 4.39%, S-NZEC135 with 14.2%, and S-NZEC151 with 3.37% pseudogenes). The closely related S-SyEd1T's genome further has ∼7% pseudogene content (Nadal-Jimenez et al. 2022). In contrast to these lineages, several Symbiopectobacterium have undergone extensive pseudogenization (S-NA = 27.59%, S-BC = 36.51%, and S-PLON1 = 34.25%).
S-MEX's genome contains 2,723 pseudogenes (23.72% of its genome). Of these, 1,563 are truncated or merged protein sequences, and 1,160 are decayed genes in intergenic regions. Pseudogenes that can be annotated are enriched for metabolite transport, fatty acid beta-oxidation, anaerobic respiration, among other functions (Fig. S5). The loss of amino acid transporters, despite maintaining amino acid biosynthesis, is a common characteristic of insect-dependent symbionts (Moran et al. 2003; McCutcheon and Moran 2011). Furthermore, S-MEX's loss of genes coding for anaerobic respiration and fatty acid beta-oxidation suggests a transition away from a pathogenic lifestyle (Toh et al. 2006; Hadizadeh et al. 2024).
Conclusion
We have identified a novel Symbiopectobacterium, S-MEX isolated from the Empoasca mexicana leafhopper that exhibits several key hallmarks indicative of the early stages of a dependent symbiosis with an insect host. Its genome has significantly expanded in size due to the proliferation of ISs and is consequently littered with pseudogenes and truncated gene fragments. This process is generally regarded as an early accelerator of genome reduction and a telltale sign of emerging host dependence (Degnan et al. 2011; Koga and Moran 2014; Siguier et al. 2014; McCutcheon et al. 2019). Moreover, S-MEX has lost key genes and capabilities (e.g. cellular motility, anaerobic respiration, vitamin biosynthesis, and secretion systems) that enable plant pathogenicity and environmental flexibility observed in Dickeya–Pectobacterium soft-rot relatives, and its close insect-associated relative S-SyEd1T (Backus 1988). Taken together, although S-MEX can infect plants similarly to S-SyEd1T (Nadal-Jimenez et al. 2022) and vertical transmission is unknown (Alexander Purcell personal communication), its genome suggests that it is on a clear trajectory toward a dependent symbiosis.
Materials and Methods
Sample Acquisition and Bacterial Isolation
Empoasca mexicana leafhoppers were collected from Buckeye trees (Aesculus californica) in Berkeley, CA, USA, July 2017. Insects were surface-sterilized for 2 min in 70% ethanol, bleached for 2 min in 30% bleach and rinsed in distilled water thrice for 2 min. Heads from two individuals were homogenized into SCP buffer using a Polytron (Brinkman Instruments; Hill and Purcell 1995). Five droplets (20 mL each) of the homogenate were plated on PWG media (Hill and Purcell 1995) and incubated for 10 d at 28 °C.
DNA Extraction and Sequencing
A single colony from one of the two plates was triple cloned on PWG. Bacterial DNA was extracted using the Blood and Tissue kit (Qiagen) after pretreatment for Gram-negative bacteria. Library preparation was performed by the QB3-Berkeley Functional Genomics Laboratory at UC Berkeley. A Diagenode Bioruptor Pico was used to fragment the DNA, and library preparation was performed using the KAPA Hyper Prep kit for DNA (KK8504). Samples were checked for quality on an AATI (now Agilent) Fragment Analyzer after adapter ligation and PCR amplification. Libraries were then pooled evenly by molarity and sequenced on an Illumina HiSeq 4000 and PacBio Sequel I at the Vincent J. Coates Genomics Sequencing Laboratory (GSL) facility at UC Berkeley.
Genome Assembly
Genome assembly leveraged both long-read (PacBio) and short-read (Illumina) sequencing. PacBio long-reads were error corrected using Canu v.2.2 (Koren et al. 2017) (parameters: genomeSize = 4.9 m, corOutCoverage = 100) and corrected reads were assembled with Flye v.2.9 (Kolmogorov et al. 2019) (parameters: genome-size 4.9 m), resulting in seven contigs with a total length of 5.7 Mb and a mean coverage of 71×. Illumina reads were quality-trimmed using Trimmomatic v.0.39 (Bolger et al. 2014) (parameters: ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36). PacBio contigs were polished with trimmed Illumina reads using two rounds of polishing with Pilon v.1.24 (Walker et al. 2014). We then manually merged contigs with homologous ends resulting in five circular contigs, with the largest contig being 5.312 Mb. Assembly completeness and quality were evaluated using QUAST v.5.2.0 (Mikheenko et al. 2018) and BUSCO v.5.2.2 (Manni et al. 2021; Simão et al. 2015) against the bacteria_odb10 (n = 124 genes) database (Table 1).
Annotation and Pan-Genome Analysis
Assembled contigs were annotated using Prokka v.1.14.6 (Seemann 2014) (parameters: –compliant, –kingdom Bacteria, –gcode 11, –rfam, –rnammer). Publicly available assemblies for other Symbiopectobacterium, Dickeya, and Pectobacterium strains were also obtained and annotated with Prokka using the same parameters (Table S2). Prokka-annotated PCGs were used to construct the pan-genome for S-MEX, and related Dickeya–Pectobacterium–Symbiopectobacterium lineages (Martinson et al. 2020) using Roary v.3.13.0 (Page et al. 2015). PCGs present in at least 90% of the strains, with at least 70% amino acid identity to each other, were considered core genes. Gene presence and absence was visualized using Phandango (Hadfield et al. 2018) (Fig. 1).
Phylogeny
BUSCO genes (n = 88; ∼25,000 sites) identified in all strains in the Dickeya–Pectobacterium–Symbiopectobacterium group and the outgroup (Escherichia coli K-12), were used to construct a phylogeny. Nucleotide sequences of the genes were aligned with MUSCLE v.3.8.31 (Edgar 2004). RAxML v.8.2.11 (Stamatakis et al. 2005) was used to reconstruct a tree for the concatenated matrix under a GTR + FO + G4m model assigned to each gene-wise partition and 100 bootstraps.
GO Annotation and Functional Enrichment
Orthologs of the PCGs, their GO functions (Ashburner et al. 2000), KEGG ortholog (KO) functions, and Clusters of Orthologous Groups (COG) (Tatusov 2000) were annotated using eggNOG-mapper v.2.1.3 (Huerta-Cepas et al. 2017) in diamond mode (Buchfink et al. 2015) using the EggNOG database for bacterial species (Huerta-Cepas et al. 2019) (parameters selected to include domain-based annotations: query_cover = 20, subject_cover = 20, pident = 40, score = 60, evalue = 0.001). The enriched GO functional categories among gene sets were identified using enrichR v.3.2 (Kuleshov et al. 2016) and KEGG pathway completeness was estimated using the KO term annotations with the kegg-pathways-completeness tool v.1.3. Additionally, pseudogenes were annotated for all strains using Pseudofinder v.1.0 (Syberg-Olsen et al. 2020) in diamond mode (Buchfink et al. 2015), to identify truncated or merged sequences and remnants of PCGs in intergenic sequences. In silico predictions of pseudogenes may be an underestimate compared to manual annotation. However, the common computational approach is consistent with other efforts and pseudogene inference should be a reliable indicator of relative variation across genomes. All statistical analyses and graphing were performed using custom Python v.3.9 and R-scripts v.4.1.0.
Supplementary Material
Acknowledgments
We thank Alexander Purcell for his helpful insights and discussion related to this work and strain discovery. We also thank Younghwan Kwak, Reo Maynard, Ryan Torres, and Miguel Estrada Caballero for helpful feedback on this article.
Contributor Information
Deepika Gunasekaran, Department of Molecular and Cellular Biology, University of California, Merced, USA.
Anne Sicard, INRAE, Université de Strasbourg, UMR SVQV, Colmar, France.
Rodrigo P P Almeida, Department of Environmental Science, Policy and Management, University of California, Berkeley, USA.
Gordon M Bennett, Department of Life and Environmental Sciences, University of California, Merced, USA.
Supplementary Material
Supplementary material is available at Genome Biology and Evolution online.
Data Availability
The complete genome assembly of S-MEX has been submitted to Genbank (accession CP189882). Raw reads can be accessed using the Sequence Read Archive (SRA) accessions SRR33420639 and SRR33420640, and BioProject accession PRJNA1257457.
Literature Cited
- Alič Š, Pédron J, Dreo T, Van Gijsegem F. Genomic characterisation of the new Dickeya fangzhongdai species regrouping plant pathogens and environmental isolates. BMC Genomics. 2019:20:1–18. 10.1186/S12864-018-5332-3/FIGURES/6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner M, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000:25:25–29. 10.1038/75556 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Backus EA. Sensory systems and behaviours which mediate hemipteran plant-feeding: a taxonomic overview. J Insect Physiol. 1988:34:151–165. 10.1016/0022-1910(88)90045-5 [DOI] [Google Scholar]
- Bennett G. Evolving integrated multipartite symbioses between plant-sap feeding insects (Hemiptera) and their endosymbionts. Cellular dialogues in the Holobiont. 2020:173–200. 10.1201/9780429277375-11/EVOLVING-INTEGRATED-MULTIPARTITE-SYMBIOSES-PLANT-SAP-FEEDING-INSECTS-HEMIPTERA-ENDOSYMBIONTS-GORDON-BENNETT [DOI] [Google Scholar]
- Bennett GM, Moran NA. Heritable symbiosis: the advantages and perils of an evolutionary rabbit hole. Proc Natl Acad Sci U S A. 2015:112:10169–10176. 10.1073/PNAS.1421388112/ASSET/7C9A8366-92EA-4892-BE66-6DB903B5298F/ASSETS/GRAPHIC/PNAS.1421388112FIG03.JPEG [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014:30:2114–2120. 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bourque G, et al. Ten things you should know about transposable elements. Genome Biol. 2018:19:1–12. 10.1186/S13059-018-1577-Z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015:12:59–60. 10.1038/nmeth.3176 [DOI] [PubMed] [Google Scholar]
- Buchner P. Endosymbiosis of animals with plant microorganisms. Public Library of Science; 1965. 10.1371/JOURNAL.PONE.0189779 [DOI] [Google Scholar]
- Büttner D, He SY. Type III protein secretion in plant pathogenic bacteria. Plant Physiol. 2009:150:1656–1664. 10.1104/PP.109.139089 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charkowski AO. The changing face of bacterial soft-rot diseases. Annu Rev Phytopathol. 2018:56:269–288. 10.1146/ANNUREV-PHYTO-080417-045906/CITE/REFWORKS [DOI] [PubMed] [Google Scholar]
- Dale C, Plague GR, Wang B, Ochman H, Moran NA. Type III secretion systems and the evolution of mutualistic endosymbiosis. Proc Natl Acad Sci U S A. 2002:99:12397–12402. 10.1073/PNAS.182213299/ASSET/399C845C-CAC4-4D44-832E-CE009462F9EF/ASSETS/GRAPHIC/PQ1822132003.JPEG [DOI] [PMC free article] [PubMed] [Google Scholar]
- Degnan PH, et al. Origin and examination of a leafhopper facultative endosymbiont. Curr Microbiol. 2011:62:1565–1572. 10.1007/S00284-011-9893-5/FIGURES/2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deng W, et al. Assembly, structure, function and regulation of type III secretion systems. Nat Rev Microbiol. 2017:15:323–337. 10.1038/nrmicro.2017.20 [DOI] [PubMed] [Google Scholar]
- Drew GC, Stevens EJ, King KC. Microbial evolution and transitions along the parasite–mutualist continuum. Nat Rev Microbiol. 2021:19:623–638. 10.1038/s41579-021-00550-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004:32:1792–1797. 10.1093/NAR/GKH340 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher RM, Henry LM, Cornwallis CK, Kiers ET, West SA. The evolution of host-symbiont dependence. Nat Commun. 2017:8:1–8. 10.1038/NCOMMS15973;SUBJMETA [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garber AI, et al. The evolution of interdependence in a four-way mealybug symbiosis. Genome Biol Evol. 2021:13(8): evab123. 10.1093/GBE/EVAB123 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gupta A, Nair S. Dynamics of insect–microbiome interaction influence host and microbial symbiont. Front Microbiol. 2020:11:545024. 10.3389/FMICB.2020.01357/BIBTEX [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hadfield J, et al. Phandango: an interactive viewer for bacterial population genomics. Bioinformatics. 2018:34:292–293. 10.1093/bioinformatics/btx610 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hadizadeh I, et al. Transcriptome analysis unravels the biocontrol mechanism of Serratia plymuthica A30 against potato soft rot caused by Dickeya solani. PLoS One. 2024:19:e0308744. 10.1371/JOURNAL.PONE.0308744 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen AK, Moran NA. The impact of microbial symbionts on host plant utilization by herbivorous insects. Mol Ecol. 2014:23:1473–1496. 10.1111/MEC.12421 [DOI] [PubMed] [Google Scholar]
- Hill BL, Purcell AH. Acquisition and retention of Xylella fastidiosa by an efficient vector, Graphocephala atropunctata. Phytopathology. 1995:85:209–212. 10.1094/PHYTO-85-209 [DOI] [Google Scholar]
- Holt JR, Cavichiolli de Oliveira N, Medina RF, Malacrinò A, Lindsey ARI. Insect–microbe interactions and their influence on organisms and ecosystems. Ecol Evol. 2024:14:e11699. 10.1002/ECE3.11699 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huerta-Cepas J, et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol. 2017:34:2115–2122. 10.1093/molbev/msx148 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019:47:D309–D314. 10.1093/nar/gky1085 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaltenpoth M, Flórez LV, Vigneron A, Dirksen P, Engl T. Origin and function of beneficial bacterial symbioses in insects. Nat Rev Microbiol. 2025:23:551–567. 10.1038/s41579-025-01164-z [DOI] [PubMed] [Google Scholar]
- Kobiałka M, Świerczewski D, Walczak M, Urbańczyk W. Extremely distinct microbial communities in closely related leafhopper subfamilies: Typhlocybinae and Eurymelinae (Cicadellidae, Hemiptera). mSystems. 2025:10:e0060325. 10.1128/MSYSTEMS.00603-25/SUPPL_FILE/MSYSTEMS.00603-25-S0009.MP4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koga R, Moran NA. Swapping symbionts in spittlebugs: evolutionary replacement of a reduced genome symbiont. ISME J. 2014:8:1237–1246. 10.1038/ISMEJ.2013.235 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019:37:540–546. 10.1038/s41587-019-0072-8 [DOI] [PubMed] [Google Scholar]
- Koren S, et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017:27:722–736. 10.1101/gr.215087.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuleshov MV, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016:44:W90–W97. 10.1093/nar/gkw377 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Latorre A, Manzano-Marín A, JoséJos C, Beltrán J. Dissecting genome reduction and trait loss in insect endosymbionts. Ann N Y Acad Sci. 2017:1389:52–75. 10.1111/NYAS.13222 [DOI] [PubMed] [Google Scholar]
- Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021:38:4647–4654. 10.1093/MOLBEV/MSAB199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martinson VG, et al. Multiple origins of obligate nematode and insect symbionts by a clade of bacteria closely related to plant pathogens. Proc Natl Acad Sci U S A. 2020:117:31979–31986. 10.1073/PNAS.2000860117/SUPPL_FILE/PNAS.2000860117.SAPP.PDF [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matte LM, Genal AV, Landolt EF, Danka ES. T6SS in plant pathogens: unique mechanisms in complex hosts. Infect Immun. 2024:92:e0050023. 10.1128/IAI.00500-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCutcheon JP, Boyd BM, Dale C. The life of an insect endosymbiont from the cradle to the grave. Curr Biol. 2019:29:R485–R495. 10.1016/J.CUB.2019.03.032 [DOI] [PubMed] [Google Scholar]
- McCutcheon JP, Moran NA. Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol. 2011:10:13–26. 10.1038/nrmicro2670 [DOI] [PubMed] [Google Scholar]
- Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018:34:i142–i150. 10.1093/BIOINFORMATICS/BTY266 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moran NA, Bennett GM. The tiniest tiny genomes. Annu Rev Microbiol. 2014:68:195–215. 10.1146/ANNUREV-MICRO-091213-112901/CITE/REFWORKS [DOI] [PubMed] [Google Scholar]
- Moran NA, Plague GR, Sandström JP, Wilcox JL. A genomic perspective on nutrient provisioning by bacterial symbionts of insects. Proc Natl Acad Sci U S A. 2003:100:14543–14548. 10.1073/PNAS.2135345100/ASSET/073ABF00-E618-4431-90A5-719C295572FF/ASSETS/GRAPHIC/PQ2135345002.JPEG [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moran NA, Tran P, Gerardo NM. Symbiosis and insect diversification: an ancient symbiont of sap-feeding insects from the bacterial phylum bacteroidetes. Appl Environ Microbiol. 2005:71:8802–8810. 10.1128/AEM.71.12.8802-8810.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nadal-Jimenez P, Siozios S, Halliday N, Cámara M, Hurst GDD. Symbiopectobacterium purcellii, gen. nov., sp. nov., isolated from the leafhopper Empoasca decipiens. Int J Syst Evol Microbiol. 2022:72:005440. 10.1099/IJSEM.0.005440/CITE/REFWORKS [DOI] [PubMed] [Google Scholar]
- Nunes Leite L, Visnovsky SB, Wright PJ, Pitman AR. Draft genome sequences of three “Candidatus Symbiopectobacterium” isolates collected from potato tubers grown in New Zealand. Microbiol Resour Announc. 2023:12:e0114822. 10.1128/MRA.01148-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oakeson KF, et al. Genome degeneration and adaptation in a nascent stage of symbiosis. Genome Biol Evol. 2014:6:76–93. 10.1093/GBE/EVT210 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Page AJ, et al. Roary: rapid large-scale prokaryote pan-genome analysis. Bioinformatics. 2015:31:3691–3693. 10.1093/bioinformatics/btv421 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perreau J, Moran NA. Genetic innovations in animal–microbe symbioses. Nat Rev Genet. 2022:23:23–39. 10.1038/s41576-021-00395-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Puhar A, Sansonetti PJ. Type III secretion system. Curr Biol. 2014:24:R784–R791. 10.1016/j.cub.2014.07.016 [DOI] [PubMed] [Google Scholar]
- Purcell AH, Suslow KG, Klein M. Transmission via plants of an insect pathogenic bacterium that does not multiply or move in plants. Microb Ecol. 1994:27:19–26. 10.1007/BF00170111/METRICS [DOI] [PubMed] [Google Scholar]
- Salem H, Florez L, Gerardo N, Kaltenpoth M. An out-of-body experience: the extracellular dimension for the transmission of mutualistic bacteria in insects. Proc R Soc Lond B Biol Sci. 2015:282:20142957. 10.1098/RSPB.2014.2957 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014:30:2068–2069. 10.1093/bioinformatics/btu153 [DOI] [PubMed] [Google Scholar]
- Siguier P, Gourbeyre E, Chandler M. Bacterial insertion sequences: their genomic impact and diversity. FEMS Microbiol Rev. 2014:38:865–891. 10.1111/1574-6976.12067 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015:31:3210–3212. 10.1093/BIOINFORMATICS/BTV351 [DOI] [PubMed] [Google Scholar]
- Siozios S, et al. Genome dynamics across the evolutionary transition to endosymbiosis. Curr Biol. 2024:34:5659–5670.e7. 10.1016/J.CUB.2024.10.044 [DOI] [PubMed] [Google Scholar]
- Stamatakis A, Ludwig T, Meier H. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics. 2005:21:456–463. 10.1093/bioinformatics/bti191 [DOI] [PubMed] [Google Scholar]
- Syberg-Olsen M, Arkadiy G, Keeling P, McCutcheon J, Husnik F. Pseudofinder. GitHub Repository. 2020. https://github.com/filip-husnik/pseudofinder/
- Tatusov RL. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000:28:33–36. 10.1093/nar/28.1.33 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Toh H, et al. Massive genome erosion and functional adaptations provide insights into the symbiotic lifestyle of Sodalis glossinidius in the tsetse host. Genome Res. 2006:16:149–156. 10.1101/GR.4106106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vasquez YM, Li Z, Xue AZ, Bennett GM. Chromosome-level genome assembly of the aster leafhopper (Macrosteles quadrilineatus) reveals the role of environment and microbial symbiosis in shaping pest insect genome evolution. Mol Ecol Resour. 2024:24:e13919. 10.1111/1755-0998.13919 [DOI] [PubMed] [Google Scholar]
- Walker BJ, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014:9:e112963. 10.1371/journal.pone.0112963 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wierz JC, et al. Intracellular symbiont Symbiodolus is vertically transmitted and widespread across insect orders. ISME J. 2024:18:99. 10.1093/ISMEJO/WRAE099 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson ACC, Duncan RP. Signatures of host/symbiont genome coevolution in insect nutritional endosymbioses. Proc Natl Acad Sci U S A. 2015:112:10255–10261. 10.1073/PNAS.1423305112/ASSET/379C85F8-1B48-4550-827D-C3DCF57D8223/ASSETS/GRAPHIC/PNAS.1423305112FIG03.JPEG [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Syberg-Olsen M, Arkadiy G, Keeling P, McCutcheon J, Husnik F. Pseudofinder. GitHub Repository. 2020. https://github.com/filip-husnik/pseudofinder/
Supplementary Materials
Data Availability Statement
The complete genome assembly of S-MEX has been submitted to Genbank (accession CP189882). Raw reads can be accessed using the Sequence Read Archive (SRA) accessions SRR33420639 and SRR33420640, and BioProject accession PRJNA1257457.


