Skip to main content
mSystems logoLink to mSystems
. 2023 Sep 26;8(5):e00706-23. doi: 10.1128/msystems.00706-23

Arsenophonus symbiosis with louse flies: multiple origins, coevolutionary dynamics, and metabolic significance

Jana Martin Říhová 1, Shruti Gupta 1, Alistair C Darby 2, Eva Nováková 1,3, Václav Hypša 1,3,
Editor: Jack A Gilbert4
PMCID: PMC10654098  PMID: 37750682

ABSTRACT

Arsenophonus is a widespread insect symbiont with life strategies that vary from parasitism to obligate mutualism. In insects living exclusively on vertebrate blood, mutualistic Arsenophonus strains are presumed to provide B vitamins missing in the insect host diet. Hippoboscidae, obligate blood feeders related to tsetse flies, have been previously suggested to have acquired Arsenophonus symbionts in several independent events. Based on comparative genomic analyses of 11 Hippoboscidae-associated strains, 9 of them newly assembled, we reveal a wide range of their genomic characteristics and phylogenetic affiliations. Phylogenetic patterns and genomic traits split the strains into two different types. Seven strains display characteristics of obligate mutualists with significantly reduced genomes and long phylogenetic branches. The remaining four strains cluster on short branches, and their genomes resemble those of free-living bacteria or facultative symbionts. Both phylogenetic positions and genomic traits indicate that evolutionary history of the Hippoboscidae-Arsenophonus associations is a mixture of short-term coevolutions with at least four independent origins. The comparative approach to a reconstruction of B vitamin pathways across the available Arsenophonus genomes has produced two kinds of patterns. On one hand, it indicates the different importance of individual B vitamins in the host-symbiont interaction. While some (riboflavin, pantothenate, and folate) seem to be synthesized by all Hippoboscidae-associated obligate symbionts, pathways for others (thiamine, nicotinamide, and cobalamin) are mostly missing. On the other hand, the broad comparison has produced patterns that can serve as bases for further assessments of the pathways’ completeness and functionality.

IMPORTANCE

Insects that live exclusively on vertebrate blood utilize symbiotic bacteria as a source of essential compounds, e.g., B vitamins. In louse flies, the most frequent symbiont originated in genus Arsenophonus, known from a wide range of insects. Here, we analyze genomic traits, phylogenetic origins, and metabolic capacities of 11 Arsenophonus strains associated with louse flies. We show that in louse flies, Arsenophonus established symbiosis in at least four independent events, reaching different stages of symbiogenesis. This allowed for comparative genomic analysis, including convergence of metabolic capacities. The significance of the results is twofold. First, based on a comparison of independently originated Arsenophonus symbioses, it determines the importance of individual B vitamins for the insect host. This expands our theoretical insight into insect-bacteria symbiosis. The second outcome is of methodological significance. We show that the comparative approach reveals artifacts that would be difficult to identify based on a single-genome analysis.

KEYWORDS: bacterial symbiosis, hematophagy, coevolution, genome evolution

INTRODUCTION

Most insects maintain associations with communities of bacteria, with life strategies ranging from pathogens to obligate mutualists. The generally accepted view assumes that, upon entering the host, the bacteria display characteristics identical or similar to those of their free-living relatives. During the early stage of adaptation, they are distributed in nonspecialized host cells or extracellularly in the gut lumen and rely on both vertical and horizontal transmissions to the new host. From the host’s perspective, these bacteria can be pathogens/parasites or facultative symbionts, often of uncertain significance. Some of them may gradually evolve into obligatory mutualists. This process involves several dramatic changes in their biology and genomes. They usually become restricted themselves to specialized host cells and organs (called bacteriocytes and bacteriomes, respectively) (1), and their transmission to a new host is exclusively vertical, from mother to progeny. However, the most radical change is rapid genomic evolution. Due to the small effective population sizes and relaxed selection, the symbionts undergo a decay process, manifested by shrinking of the genome, due to the loss of unnecessary genes, and shift of the nucleotide composition toward adenine and thymine bases (2). This process may eventually lead to complete loss of the symbiont and its replacement with a new competent bacterium (3). Such losses and replacements are manifested by incongruent phylogenies of hosts and their symbionts and also by striking genomic differences of the symbionts within a single monophyletic group of the hosts.

Among the bacterial taxa that are best suited for the investigation of symbiogenesis is the gammaproteobacterial genus Arsenophonus, which belongs to the most common insect symbionts. Life strategies of its different strains vary from parasites, e.g., reproductive manipulators, to obligate mutualists necessary for the host reproduction and/or development (4). In insects living exclusively on vertebrate blood, Arsenophonus manifests both symbiotic lifestyles, i.e., facultative symbiont and obligate mutualist. For example, in triatomine bugs, it is distributed across various tissues in several species, displaying features of a “young” facultative symbiont with uncertain role for the host (5 8). In contrast, Arsenophonus strains with the characteristics typical for obligate mutualists were described from sucking lice of the family Pediculidae and several species of blood-feeding dipterans (Hippoboscidae). The lice-associated Arsenophonus possess such dramatically divergent genome that it was initially not recognized as Arsenophonus and had been described as separate genus Riesia (9, 10). In Hippoboscidae, the situation is much more complex, suggesting a dynamic process of Arsenophonus acquisitions and replacements (11). This insect group is one of the blood-feeding families constituting superfamily Hippoboscoidea, together with Glossinidae, Nycteribiidae, and Streblidae (12). Similar to other Hippoboscoidea, they live exclusively on vertebrate blood. While they undergo holometabolous development, their larvae live in the female uterus and are nourished via milk glands (13). Due to this unique system, called adenotrophic viviparity, hippoboscid individuals utilize vertebrate blood as an exclusive dietary source throughout the whole life cycle. As such, they face a shortage of some important dietary components for which they are dependent on their provision by obligate symbiotic bacteria. B vitamins are the most often suggested candidates for these missing components (14). However, except for two complete Arsenophonus genomes from Melophagus ovinus and Lipoptena cervi, only 16S rRNA or a few other genes are available for the rest of the strains. This hampers both a reliable phylogenetic reconstruction (essential for elucidation of evolutionary history) and evaluation of metabolic capacities in different strains.

The ability to predict metabolic competence is particularly important, as the metabolic role of obligate mutualists in exclusively blood-feeding insects is unclear. The hypothesis that B vitamins are the main compounds provided by the symbionts to obligate blood feeders was proposed many decades ago based on experimental works (15 17) and supported later by identification of crucial genes for B vitamin biosynthesis or even complete operons horizontally transferred to these symbionts (18, 19). However, comparison of studies does not provide a consistent picture of the necessity of different vitamins for blood-feeding hosts. For example, thiamine was shown to be supplemented by Wigglesworthia to its host, tsetse fly (20). In contrast, this pathway is disrupted in the Arsenophonus strains from the two Glossina-related blood feeders, the hippoboscid species M. ovinus and L. cervi (21, 22) (and also in the genomes of several louse-associated obligate symbionts (19, 23 25).

The Hippoboscidae-associated Arsenophonus, with multiple independent strains in different stages of evolution (11), provide an ideal system for studying genome changes during the transition toward obligate mutualism and possible evolutionary convergences. To address this process, we reconstruct nine new Arsenophonus genomes from eight hippoboscid species representing five genera, Pseudolynchia, Hippobosca, Ornithoica, Ornithomya, and Crataerina (Crataerina hirundinis, previously classified within genus Stenepteryx). Together with other available Arsenophonus genomes (including symbionts of hippoboscid species M. ovinus and L. cervi), we analyze phylogenetic relationships of these bacteria, mainly with a focus on their symbiosis establishment and evolution, i.e., the dynamics of the Arsenophonus acquisition across different host species. We use the whole-genome characteristics to evaluate the nature of their symbioses and compare their potential to provide the hosts with essential metabolites.

MATERIALS AND METHODS

Sample preparation

The samples originated from several sources. Some were collected by our lab members in 2012 (Ornithoica turdi and Hippobosca equina) and 2020 (Crataerina spp.). The samples of Ornithomya spp. were provided in 2019 by the Department of Zoology (Faculty of Science, University of South Bohemia, Czech Republic). The Pseudolynchia canariensis sample was acquired from Dr. Kayce C. Bell (Department of Mammalogy, Natural History Museum of Los Angeles County, Los Angeles, CA, USA). The complete list with collection sites and avian/mammal hosts is provided in Table S1A and B. All samples were stored in 96% ethanol at −20°C. The DNA was extracted using QIAamp DNA Micro Kit (Qiagen) from the abdomen of each individual. DNA quality was assessed by gel electrophoresis, and its concentration was measured with a Qubit High sensitivity kit.

Genome assembly

All samples were sequenced on the Illumina NovaSeq6000 platform (W. M. Keck Center, University of Illinois at Urbana Champaign, IL, USA) generating 2 × 250 paired-end reads. The quality of raw reads was checked using FastQC (26), and the low-quality read ends were trimmed using BBTools (https://jgi.doe.gov/data-and-tools/bbtools). The number of resulting reads for each data set is provided in Table S1C. The trimmed reads were assembled using SPAdes v.3.10 (27), producing metagenomic assemblies. The bacterial contigs were identified by blasting all genes from Arsenophonus nasoniae FIN (NZ_CP038613.1), Arsenophonus melophagi (SAMN33924085) and Arsenophonus lipoptenae (NZ_CP013920.1) as a query against each metaassembly using custom BLASTn implemented in the Geneious Prime v.2020.2.5 program (28). We retrieved five complete Arsenophonus genomes from Ornithomya avicularia (936,503 bp), Ornithomya biloba (874,825 bp), Ornithomya fringillina (933,061 bp), Crataerina hirundinis (919,248 bp), and Ornithoica turdi (599,419 bp), and four genome drafts from Hippobosca equina (530 contigs), two specimens of Crataerina pallida (301 and 333 contigs), and Pseudolynchia canariensis (355 contigs).

Since the genome of Arsenophonus symbiont in Hippobosca equina was highly fragmented (530 short contigs), we sequenced this particular sample also using Oxford Nanopore GridIONx5 (W.M. Keck Center, University of Illinois at Urbana-Champaign, IL, USA) generating 3,127,655 reads with the length range from 880 to 40,881 bp. Their quality was assessed using NanoPack Tools (29) calculating statistics that pointed at overall shorter read length and their lower quality (mean read quality: 14.1, mean read length: 4,506 bp), most probably caused by fragmentation and yield of the used DNA template. The quality trimming was then performed in Filtlong (https://github.com/rrwick/Filtlong). To assemble the H. equina nanopore reads, we employed Canu assembler (30), which yielded 3,894 contigs. The nanopore filtered reads were mapped onto the assembly using Minimap2 (31). The assembly was polished in the following steps. First, we employed consensus calling and polish using Racon (32) and two iterations of Medaka polish (https://github.com/nanoporetech/medaka). Second, to obtain optimal sequence correctness, we mapped Illumina reads onto the assembly using Minimap2, and then we used Racon tool to correct the assembly by consensus generation and polishing. The resulting assembly consisted of 3,595 contigs. The filtering BLAST procedures (as described above) retrieved 16 contigs of the Arsenophonus symbiont. All new Arsenophonus genomes were annotated using PROKKA v.1.12 (33) and deposited in the GenBank under BioProject accession number PRJNA949118 (Table 1). Annotations for the corresponding genomes are deposited in Mendeley Data under the doi link https://doi.org/10.17632/8jsd4ds57k.3.

TABLE 1.

List of the Arsenophonus genomes (ordered by genome size) and their main genomic characteristics a

Species/strain Insect host Genome size (bp) No. of proteins: annotated/
hypothetical
%GC No. of phages No. of transposons No. of mobile elements No. of ankyrin repeats No. of invasions No. of pseudogenes: predicted genes/no. ORF predicted No. of phages by phaster: intact/incomplete/questionable Accession (NCBI) Host taxonomy
A. nasoniae Nasonia vitripennis 4,987,107 5,555/3,198 38.7 29 2 0 1 22 1,307/374 23/14/9 GCA_004768525.1 Hy
A. triatominarum Triatoma infestans 4,721,517 2,118/1,481 37.8 19 0 0 0 10 4,057/1,161 22/37/25 GCA_001640365.1 T
A. nasoniae N. vitripennis 3,670,548 3,499/1,434 37.5 17 2 0 1 16 534/121 1/8/2 GCA_000429565.1 Hy
Arsenophonus
apicola
Apis mellifera 3,639,254 3,143/1,176 37.8 3 7 0 1 16 687/202 8/6/2 GCA_020268605.1 Hy
A. apicola A. mellifera 3,315,739 3,081/1,069 37.5 3 6 0 1 15 433/110 1/7/2 GCA_903968575.1 Hy
Arsenophonus sp. Entylia carinata 3,228,533 3,689/1,967 39.6 7 0 0 0 5 1,128/160 0/14/1 GCA_002287155.1 He
Arsenophonus sp. Hippobosca equina 3,226,959 3,080/1,049 37.5 8 1 0 0 17 589/135 8/6/1 SAMN33923977 Hi
Arsenophonus sp. Crataerina pallida 3,053,565 3,098/1,158 38.1 6 0 0 0 12 718/134 0/7/0 SAMN33923976 Hi
Arsenophonus sp. Aleurodicus
floccissimus
3,001,875 4,458/1,984 37.0 2 0 0 0 14 3,436/929 8/8/0 GCA_900343025.1 He
Arsenophonus sp. Nilaparvata lugens 2,953,863 2,658/789 37.6 2 1 0 0 13 328/83 2/2/1 GCA_000757905.1 He
Arsenophonus sp. Crataerina pallida 2,844,341 2,805/980 37.9 6 0 0 0 11 652/128 0/9/0 SAMN33923975 Hi
Arsenophonus sp. Aphis craccivora 2,424,437 3,206/1,427 40.1 2 0 0 0 3 2,103/718 2/5/2 GCA_013460135.1 He
Arsenophonus sp. Bemisia tabaci 2,328,823 2,711/869 38.3 2 1 0 0 2 1,421/339 0/1/1 GCA_902713415.1 He
Arsenophonus sp. Bemisia tabaci 1,860,497 2,411/894 36.9 1 3 0 0 1 1,596/467 0/5/1 GCA_004118055.1 He
Arsenophonus sp. Pseudolynchia
canariensis
1,229,579 1,313/469 37.6 2 1 0 0 5 316/54 1/1/0 SAMN33923974 Hi
A. melophagi Melophagus ovinus 1,155,312 725/62 32.2 0 0 0 0 0 39/21 0/0/0 SAMN33924085 Hi
Arsenophonus sp. Ornithomya avicularia 936,503 660/38 24.7 0 0 0 0 0 41/11 0/1/0 SAMN33923973 Hi*
Arsenophonus sp. Ornithomya fringillina 933,061 660/37 29.4 0 0 0 0 0 34/7 0/1/0 SAMN33923972 Hi*
Arsenophonus sp. Crataerina hirundinis 919,248 661/41 28.8 0 0 0 0 0 45/14 0/1/0 SAMN33923971 Hi*
Arsenophonus sp. Ornithomya
biloba
874,825 621/31 27.6 0 0 0 0 0 37/13 0/2/0 SAMN33923970 Hi*
Arsenophonus sp. Ceratovacuna japonica 853,149 528/29 18.5 0 0 0 0 0 42/22 0/1/0 GCA_024349725.1 He
A. lipoptenae Lipoptena cervi 836,724 633/33 24.9 0 0 0 0 0 27/13 0/2/0 GCA_001534665.1 Hi
Arsenophonus sp. Aleurodicus dispersus 663,125 454/37 32.2 0 0 0 0 0 177/114 0/1/0 GCA_900343015.1 He
Arsenophonus sp. Ornithoica turdi 599,419 565/23 23.6 0 0 0 0 0 27/8 0/3/0 SAMN33923969 Hi
Riesia pediculicola Pediculus humanus
capitis
574,390 480/31 28.5 0 0 0 0 0 24/3 0/1/0 GCA_000093065.1 A
Riesia
pediculischaeffi
Pediculus shaeffi 566,667 496/53 31.7 0 0 0 0 0 57/18 0/2/0 GCA_002073895.1 A
Riesia sp. Pthirus gorillae 528,700 461/24 25.1 0 0 0 0 0 27/6 0/1/0 GCA_002074035.1 A
a

Host taxonomy: Hy, Hymenoptera; T, Triatominae; He, Hemiptera; Hi, Hippoboscidae (obligate Hippoboscidae symbionts in bold, Ornithomya-Crataerina cluster designated by asterisks); A, Anoplura (sucking lice).

Symbiont phylogeny

To determine phylogenetic position of the newly sequenced Arsenophonus strains, we downloaded from the National Center for Biotechnology Information (NCBI) database all available genomes of Arsenophonus and four additional gammaproteobacteria as outgroups, Haemophilus parainfluenzae (NZ_CP007470.1), Sodalis glossinidius (AP008232), Proteus mirabilis (NC_010554.1), and Providencia stuarti (NZ_CP095443.1). For this data set, we built the matrix by concatenating 57 shared single-copy orthologs determined by OrthoFinder (34). The sequences were aligned in MAFFT v.7.450 (35) under E-INS-i settings implemented in Geneious Prime v.2020.2.5 platform (28) and processed by Gblocks (36) using the options for more stringent selection. The resulting matrix of 10,885 amino acids was analyzed by several alternative phylogenetic approaches. Maximum likelihood trees were retrieved by PhyML v.3.0 (37) and IQ-TREE v.2 (38); Bayesian tree was inferred by PhyloBayes-MPI (39). PhyML analysis was performed under the best-fitting model CpREV+R+F selected using Smart Model Selection tool (40) with 100 bootstrap replicates. IQ-TREE v.2 analysis was run with 1,000 ultrafast bootstraps under the cpREV+F+I+I+R4 selected by the program according to BIC. Since the data contained taxa with extremely different branch lengths and nucleotide composition, we used PhyloBayes with CAT-GTR model to minimize the artifacts (41). The analysis was run for 30,000 generations. The chain convergence was assessed using the bpcomp and tracecomp commands (checking for the maximum bipartition difference and effective population size of the sample). As an alternative to this standard PhyloBayes analysis, we also analyzed a matrix recoded by Dayhoff6 scheme, which could possibly further decrease the artifacts caused by the composition heterogeneity (42). The list of accession numbers for the used genomes and orthologs is provided in Table S1D and E. The alignment and the phylogenetic trees are deposited in Mendeley Data under the doi link https://doi.org/10.17632/8jsd4ds57k.3.

Host phylogeny and coevolutionary analyses

The phylogeny of hosts was reconstructed for two reasons. First, for the newly assembled samples, it served for the verification of the hosts’ taxonomic determination. Second, it was used to evaluate coevolutionary history and independent acquisitions of the Arsenophonus symbionts. For the new samples, we identified mitochondrial genomes by blasting (BLASTn) the complete mitochondrial genome of Melophagus ovinus against the whole assemblies. The contigs corresponding to the mitochondrial genomes were then annotated in Mitos server (43). To construct the matrix, we used relevant COI sequences from the new mitochondrial genomes, other COI sequences of hippoboscids available in NCBI, and Glossina and Gasterophilus as outgroups (Table S1F). The sequences were aligned by MAFFT v.7.450 under E-INS-i settings implemented in Geneious Prime v.2020.2.5 platform (the alignment did not contain any indels which would require codon-based aligning procedure). The trees were inferred by IQ-TREE v.2 (1,000 ultrafast bootstraps under the model GTR+F+I+G4 selected by the program).

To utilize all information provided by the mitochondrial genomes, we constructed and analyzed another matrix composed of 12 mitochondrial protein-coding genes of the taxa for which complete or nearly complete mitochondrial genomes are available (Table S1G). The sequences were aligned by MAFFT v.7.450 under the E-INS-i settings implemented in Geneious Prime. The matrix was processed in Gblocks using options for a more stringent selection. The resulting matrix consisted of 3,205 aa. The concatenated multigene phylogenetic matrix was analyzed using IQ-TREE v.2 (1,000 ultrafast bootstraps under the mtInv+I+G4 model selected by the program). The alignments and the phylogenetic trees are deposited in Mendeley Data under the doi https://doi.org/10.17632/8jsd4ds57k.3.).

Genome content and metabolic capacity

Since the analyzed genomes were obtained from different sources (our own assemblies and NCBI downloads), we first performed their de novo annotation by PROKKA v.1.12 (33) to obtain a standardized input for the downstream analyses. Based on these annotations, we inferred the main parameters of the genome contents (number of genes, phages, mobile elements, etc.; Table 1). The number of phages was further verified by PHASTER (44). To estimate number of pseudogenes, we used the Pseudofinder program (45) with default settings. Since some of the symbionts were potentially closely related, we investigated possible synteny of their genomes using two programs, Mauve (46) and Clinker (47).

Metabolic capacities were assessed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) server (48). K numbers, which assign metabolic functions to the annotated genes, were identified for each genome by the BlastKOALA server (48). Using the KEGG-based structure of metabolic pathways, we mapped these capacities for each genome (Table S2). Based on this overview, and using KEGG metabolic pathways as a background, we assessed completeness and therefore potential functionality of the pathways for the synthesis of B vitamins and amino acids.

In order to visualize similarities among the genome contents, we employed MicroEco v.0.13.0 (49) package in R environment and performed principal coordinate analysis (PCoA) based on Bray-Curtis distances calculated for two data sets: the first composed of all orthologs identified by OrthoFinder and the second composed of genes with assigned K numbers. The graphical outputs were generated utilizing ggplot2 v.3.4.0 (50) and further processed in Adobe Illustrator v.25.4.1.

RESULTS AND DISCUSSION

The set of Arsenophonus symbionts from Hippoboscidae covered a broad range of evolutionary stages reflected by the genomic characteristics, such as genome size, guanine/cytosine (GC) content, presence of mobile genetic elements, and branch lengths in the phylogenetic trees. Of the nine new metagenomic assemblies, five produced complete closed Arsenophonus genomes (Ornithomya avicularia, O. biloba, O. fringillina, Crataerina hirundinis, and Ornithoica turdi), and four contained genomic drafts fragmented into contigs (Hippobosca equina in 16 contigs, two specimens of Crataerina pallida in 301 and 333 contigs, and Pseudolynchia canariensis in 355 contigs). The last genome (from P. canariensis) was of insufficient completeness and was therefore excluded from the interpretation of the metabolic analysis; it was, however, retained in the phylogenetic analyses as it provided a sufficient number of orthologs for building the matrix.

Arsenophonus phylogeny

Phylogenetic analyses revealed several origins of hippoboscid-associated Arsenophonus, supporting the results previously obtained by analyzing 16S rRNA genes (11). The trees inferred here by several methods from a multigene matrix agreed on several patterns. The most notable is the split between the small, long-branched genomes and the large, short-branched genomes. Seven of the Hippoboscidae symbionts formed long branches, suggesting that they have experienced a long history of vertical transmission and accelerated molecular evolution, leading to the divergence from the other Arsenophonus lineages (Fig. 1, red taxa; Fig. S1). The remaining four lineages were placed within a cluster of short branches with a poor inner resolution, indicating that these are recent symbioses where the symbionts are highly similar to other Arsenophonus lineages from different host groups (blue taxa in Fig. 1).

FIG 1.

FIG 1

PhyloBayes phylogeny of Arsenophonus symbionts derived from concatenated matrix of 57 shared single-copy orthologs (10,885 aa) under the CAT-GTR model. The analysis was run for 30,000 generations; maxdiff reached 0.04, and the effective sizes of the parameters were between 531 and 29,910. The hippoboscid-associated taxa considered in the text as “long-branched” and “short-branched” are printed in red and blue, respectively. The asterisks indicate the genomes newly assembled in this study. The arrow indicates the unstable node.

Another stable pattern is the monophyly of four symbionts obtained from three Ornithomya species and Crataerina hirundinis, which group on a long common branch (Ornithomya-Crataerina cluster hereafter). Less clear are the relationships of the remaining three long-branched symbionts (from the genera Ornithoica, Melophagus, and Lipoptena). In the trees, each formed a separate branch with no clear affinity with the other hippoboscid-associated symbionts. However, most of the hippoboscid-associated long-branched Arsenophonus symbionts cluster close to the extremely long Riesia branch. The topologies of all trees show a clear tendency to cluster according to the branch lengths, which suggests a possible role of a long-branch artifact (LBA [51]). Since we applied several approaches which should, in theory, be able to overcome this source of artifact, it is difficult to interpret this pattern. On the one hand, it could result from the LBA despite the algorithms used; on the other hand, it can reflect the real state and tendency of these clusters to establish obligate symbioses.

Genome characteristics and evolution

Comparison of the 11 Hippoboscidae-associated Arsenophonus strains illustrates typical genomic changes that accompany transition between the recently acquired symbionts (corresponding to the short-branched taxa in the phylogeny, called “nascent” thereafter) and the highly modified types, most likely representing obligate mutualists (the long-branched taxa, called “obligate” thereafter) (Table 1). Three of the Arsenophonus strains possess relatively large genomes of app. 3 Mbp (one from Hippobosca equina and two from Crataerina pallida). This size indicates that the strains are likely recently acquired symbionts. Consistent with the genome size are also other genomic characteristics, such as GC content (around 38%) and presence/absence of genes connected to the fluidity of the genomes. For example, all these genomes contain several phage-associated genes and a high number of cell invasion-associated genes. Although the number of these genes is notably lower than in other “nascent” Arsenophonus strains (from Nasonia, Triatoma, and Apis), it sharply contrasts with their complete absence in the seven reduced “obligate” Arsenophonus strains. The latter are considerably smaller but still span a large range from 1.15 Mbp in Melophagus ovinus to 0.6 Mbp in Ornithoica turdi. In addition, their GC contents range from 32% to 23.6%. Such traits are also mirrored in the phylogenetic pattern. All these genomes form long branches that exceed all other taxa except for the three highly modified genomes of the louse-associated Arsenophonus (Riesia). Additional indirect support for their role as obligate, nutritionally essential mutualists comes from the symbiotic system described from the sheep ked, Melophagus ovinus (21). In this wingless insect, Arsenophonus was shown, using fluorescence in situ hybridization (FISH) and electron microscopy, to reside in the host bacteriome. Similarly to Wigglesworthia in tsetse flies (52), it was also localized in “milk glands,” suggesting vertical transmission from mother to progeny via this organ. Among the seven “'obligate” Hippoboscidae-associated strains analyzed here, the one from M. ovinus possesses the largest genome with the highest proportion of GC. This indicates that the other strains with even smaller genomes and lower GC are also very likely mutualists. Finally, as described below, the obligate mutualistic nature of four of them (Ornithomya-Crataerina cluster) is supported by their cophylogeny with the hosts.

The difference in genome characteristics of “obligate” and “nascent” symbionts (Table 1), particularly for the repetitive sequences, can likely explain qualitatively different assembly outcomes. For five “obligate” symbionts, we were able to assemble complete closed genomes. Also, of the NCBI-downloaded genomes, the smaller one, from Lipoptena, is available in a single complete contig. In contrast, for the “nascent” symbionts, we could only recover fragmented assemblies, despite the fact that for the Hippobosca equina sample, we also generated long nanopore reads (see Materials and Methods). The Arsenophonus strain from P. canariensis possessed a seemingly mixed set of characteristics. In the phylogeny, it clustered on a short branch, among other “nascent” strains. It contained phage-related sequences, and its GC content (37.6%) was also close to the other “nascent” symbionts. On the other hand, the size (1.23 Mbp) resembled that of the “obligate” symbionts (Table 1). However, we presume that the genome size is the result of an incomplete assembly rather than a reductive evolutionary process. This view is the simplest (most parsimonious) explanation of the observed conflict and is further supported by the notable inconsistency of the metabolic reconstruction when compared to the other analyzed genomes (Table S2).

The split between “obligate” and “nascent” strains is well reflected in the PCoA analyses of the two genome-content characteristics (shared orthologs and shared genes with assigned K numbers). Both analyses showed clear separation between the two symbiont types along the main axis, which explains a very high portion of the variability (75.5% and 78.7%). The gap between these two forms is interesting and fits the general view on the evolutionary tempo during symbiogenesis. It assumes that once established in the host, the bacteria experience a period of rapid genomic changes (mainly degeneration), after which the tempo of the changes slows down, potentially reaching almost complete stasis (53 56). Therefore, it may be more difficult to capture a genome in the fast-evolution phase, rather than in the stages close to the free-living ancestor or the obligate symbionts (although an artifact due to the possibly unrepresentative number of 26 strains cannot be excluded). Apart from this separation, the analyses yielded additional interesting patterns. The seven “obligate” strains associated with the hippoboscids group within a common cluster (orange ellipses in Fig. 2), although they are scattered in a nonmonophyletic manner across the phylogenetic tree (Fig. 1). In contrast, the three Riesia strains (pink ellipses in Fig. 2), while phylogenetically closely related to the Ornithomya-Crataerina cluster, form a distinct and relatively distant group (on the second axis) from the hippoboscid strains. This may imply a convergent adaptation of unrelated strains to particular host groups (hippoboscids vs lice). On the other hand, this conflict between phylogeny and PCoA clustering could also, in theory, be due to artificial phylogeny, distorted by LBA.

FIG 2.

FIG 2

PCoA analyses of genome contents calculated using Bray-Curtis distances among all protein-coding genes (A) and genes with assigned K numbers (B) found across 26 analyzed Arsenophonus genomes. Ellipses (not statistical) show long-branched (obligate) strains associated with Hippoboscidae (orange) and lice (pink). The arrow points to the outlier (see text). The gray-shaded area is explained in the text.

The clustering patterns derived from the two different criteria (shared orthologs and shared K numbers) differed significantly on the second axis. In ortholog-based analysis (Fig. 2A), the “nascent” and “obligate” strains were broadly distributed, indicating a considerable degree of variability. In the analysis based on K numbers (Fig. 2B), the “nascent” strains aggregated in a narrow strand, while the “obligate” strains were broadly distributed (indicated by the dark gray background). This pattern shows that “nascent” strains share a significantly higher number of the metabolically important genes with assigned functions (i.e., K numbers; Fig. 2A) than all identified orthologs (including “‘non-necessary”’ genes). The difference between the “nascent” and “obligate” strains on the second axis in the analysis based on K numbers fits well with the general view of the evolution of symbionts. While “nascent” symbionts with large genomes possess similar metabolic capacities, “obligate” symbionts exhibit much greater variability, likely due to their adaptation to different conditions and metabolic demands. Since in Fig. 2B the broad distribution of “obligate” symbionts could be caused by the presence of an outlier (Aleurodicus dispersus; indicated by an arrow), we performed another analysis without this genome. Its result (Fig. S2) shows a pattern similar to that in Fig. 2B.

Coevolution with the host

To estimate the coevolutionary pattern in the Arsenophonus-Hippoboscidae association, we compared the Arsenophonus tree with the phylogeny derived from the whole mitochondrial genomes available for the hosts (Fig. 3). The relationship between the Hippoboscidae and Arsenophonus phylogenies was consistent with the general genomic characteristics of the symbionts. The only part of the Arsenophonus tree that shows a clear sign of cospeciation is the monophyletic lineage composed of long-branched strains of the Ornithomya-Crataerina cluster. This hippoboscid phylogenetic lineage also provides an example of the complex processes involving both symbionts’ cospeciation and replacement. At the host site, the branch is composed of five taxa (two species of Crataerina and three species of Ornithomya). However, only Ornithomya and C. hirundinis carry these Arsenophonus “obligate” strains. On the contrary, two samples of C. pallida harbor Arsenophonus with large genome and characteristics typical of “nascent” symbionts. In the Arsenophonus phylogeny, this strain clusters among the other “nascent” symbionts of several insect hosts. This arrangement suggests the establishment of the symbiosis prior to the Ornithomya-Crataerina diversification, followed by cospeciation of the symbiont with the Ornithomya-Crataerina cluster, and a replacement of this strain with a new phylogenetically distinct Arsenophonus in C. pallida (Fig. 3). The correct phylogenetic placement of our samples in the hippoboscid phylogeny/taxonomy was further confirmed by the additional analysis based on the COI gene available for a broader taxonomic sample (Fig. S3).

FIG 3.

FIG 3

Coevolutionary analysis of the Arsenophonus strains and their hippoboscid hosts. (Left) Arsenophonus tree simplified from Fig. 1 (only Hippoboscidae-associated strains are shown). (Right) Hippoboscidae tree derived from matrix of 12 concatenated mitochondrial genes (3,205 aa) in IQ-TREE v.2 under the mtInv + I + G4 model. The four presumably independent establishments of the symbiosis are indicated by different colors highlighting the host-symbiont connections. The connections without highlights indicate an uncertain evolutionary interpretation.

Less clear is the evolutionary history of the other “obligate” strains, i.e., the symbionts of Ornithoica, Lipoptena, and Melophagus. In the phylogenetic analyses, the obligate strains associated with hippoboscids never formed a monophyletic branch. This arrangement could in theory be affected by long-branch attraction, since in some analyses the Ornithoica, Lipoptena, and Ornithomya-Crataerina strains did form a paraphyletic group with respect to extremely long-branched Riesia. However, based on several phylogenetic and genomic patterns described below, we assume that the four mentioned lineages (Melophagus, Lipoptena, Ornithoica, and Ornithomya-Crataerina) originated independently within the genus Arsenophonus. An independent origin can be most robustly supported for the Melophagus symbiont. While located on a long branch, this strain invariantly falls into a monophyletic group with four “nascent” strains from Nasonia vitripennis and Apis mellifera, at a position phylogenetically distant from the cluster formed by several long branches. The position of the remaining three lineages (Lipoptena, Ornithoica, and Ornithomya-Crataerina) varied with the phylogenetic analyses, but they never formed a monophyletic clade. The phylogenetic arrangement of these three lineages was the least stable and supported part of the whole phylogeny (Fig. S1). Their independent origins are supported by comparisons of their genome arrangements. It is well known that bacterial genomes are fluid, and due to the presence of various categories of genes (such as phage-associated genes and mobile elements) can rapidly reorganize their gene content and lose genomic synteny (57). This is true for free-living bacteria and symbionts in the early stage of evolution. In contrast, once they get rid of the fluidity-associated genes, the symbionts lose the rearrangement capacity and retain a high degree of synteny (53, 58). In our set, all “obligate” strains (including those from Ceratovacuna japonica and Aleurodicus dispersus) lack phage-associated genes as well as transposons, in contrast to the “nascent” strains (Table 1). However, only the four strains of the Ornithomya-Crataerina cluster show a high degree of genome similarity and synteny (Fig. 4). Despite this, their genomes are not identical but differ in gene content. This provides evidence that the genomes have reached the point when they are not capable of gene rearrangements and they only continue to lose genes. In particular, the O. biloba symbiont possesses a genome considerably smaller than the other three strains. Among the genes which this smallest genome lacks is, e.g., the complete pathway for heme synthesis, present in the other three genomes (Fig. 4B). The synteny within the Ornithomya-Crataerina cluster contrasts strikingly with the other three lineages of the “obligate” symbionts, for which no sign of synteny could be found. The lack of synteny indicates two possible scenarios. First, the four linages represent four independent origins of Arsenophonus symbiosis. Second, they share a common, possibly facultative, ancestor, and their diversification into the four nonsyntenic lineages had taken place before they lost the fluidity-associated genes. Taking together all of the phylogenetic and genomic evidence, we consider the first scenario much more likely.

FIG 4.

FIG 4

Genomic comparison of the four strains of the Ornithomya-Crataerina cluster. (A) Synteny of the genomes (alignment built by MAFFT algorithm). Aob is printed in red to indicate the shortest genome. Green triangles represent protein-coding genes (the genes shown in the second row of each genome overlap with the first-row genes); orange triangles stand for rRNA genes. (B) Venn diagram of shared genes among the four strains of the Ornithomya-Crataerina cluster. Arsenophonus (O. biloba) is printed in red to indicate the shortest genome. Its difference (missing genes) compared to the other strains is indicated by the orange frame. Structures of the heme pathway, mismatch repair, and arginine transporter are adopted from the KEGG database. Strain abbreviations: A, Arsenophonus and the host; Ch, C. hirundinis; Oa, O. avicularia; Ob, O.biloba; Of, O.fringillina.

Metabolic capacity of the Arsenophonus symbionts

The overview presented in Fig. 5 (details in Table S2) shows that the pathways of the eight B vitamins differ in their completeness across the Arsenophonus genomes. Only two of them, riboflavin and lipoic acid, are complete and therefore likely functional in all genomes. Other widely present pathways, but absent or likely incomplete in several genomes, are those for folate, biotin, and pyridoxal synthesis. The most interesting patterns were found for the thiamine and, particularly, pantothenate pathways. The thiamine pathway is clearly not functional in all “obligate” strains (i.e., the obligate symbionts), except for the strain from Ornithoica turdi. In this small genome, the thiI annotation was assigned to a short sequence, but it was not recognized as a functional gene by BlastKOALA. Compared to other strains, the annotated gene was extremely shortened (288 bp vs 1,449–1,485 bp). While the absence of a single gene potentially makes a whole pathway nonfunctional, this is likely not the case in the strain Ornithoica turdi thiamine pathway. First, the presence of the additional nine genes in such a reduced genome (compared with their absence in the other “obligate”) suggests a selection in favor of preserving the pathway. Second, several sources of evidence suggest that complete thiI may not be necessary for the functionality of the thiamine pathway. Martinez-Gomez et al. (59) showed experimentally that only part of the thiI gene, the rhodanese domain, is necessary for the thiamine biosynthesis in Salmonella. Using the conserved domain database (60), we revealed that the retained fragment of the thiI corresponds to the rhodanese domain (Fig. 5). A similar situation, that is, preservation of the rhodanese domain instead of the complete thiI gene, was reported at least from two other obligate symbionts, Wigglesworthia glossinidia from tsetse fly Glossina morsitans (61) and Candidatus Pantoea from the brown marmorated stink bug (62). However, in other cases, the pathway exists without a rhodanese domain containing open reading frame (ORF). In comparison to six Rhodococcus species and Micrococcus luteus, all genomes lacked the thiI rhodanese domain while retaining the rest of the thiamine pathway (63). This contrast shows that our knowledge on the synthesis of thiamine by insect symbionts (or generally bacteria) is currently incomplete. Interestingly, consistent with the preserved thiamine pathway, the strain from Ornithoica genome is also the only one among “obligate” strains that does not encode for a thiamine transporter. The only other genomes with a missing or incomplete thiamine transporter are Arsenophonus strains from Ceratovacuna japonica and Aleurodicus dispersus, which lack all three genes, and Aleurodicus floccissimus, with one missing gene (Fig. 5; Table S2).

FIG 5.

FIG 5

An overview of B vitamin biosynthetic capacities in the analyzed genomes. The completeness of the pathways was derived by assigning K numbers in BlastKOALA and comparing the results with the structure of the pathways in KEGG (see Materials and Methods). Arsenophonus strains associated with hipoboscids are printed in bold (red denotes long-branched/obligate; blue denotes short-branched/nascent). Asterisks denote genomes newly assembled in this study.

The distribution and origin of the three genes required for pantothenate biosynthesis (panB, panC, and panE) proved to be the most complex and biologically interesting. Most of the analyzed Arsenophonus genomes do not possess these genes. The absence of pantothenate-synthesis pathway genes in several of these Arsenophonus genomes was previously reported from both phytophagous and hematophagous hosts (21, 64, 65). This absence suggests that pantothenate biosynthesis may have been missing at the origin of the entire Arsenophonus clade (Arsenophonus synapomorphy). In contrast to the general absence of the pantothenate pathway in Arsenophonus bacteria, the four closely related hippoboscid obligate symbionts from the Ornithomya-Crataerina cluster and the strain from Ornithoica turdi possess this metabolic capacity. However, the origin and location of these genes differ among the symbionts. In Riesia, the genes have been known to reside on plasmids (25, 66). In this study, we found similar arrangements in four “obligate” Arsenophonus strains from the monophyletic Ornithomya-Crataerina cluster (Fig. 5). All of these symbionts possess plasmids with sizes from app. 12 to 21 kbp, which carry the three pantothenate genes. A different picture was found in the “obligate” symbiont from Ornithoica turdi. Here, all three pantothenate genes are located directly on the bacterial chromosome.

Based on this overview, the vitamin pathways in the “obligate” Arsenophonus strains can be characterized with four categories with respect to their importance to the host:

  • Nonessential pathways: two of the vitamins, nicotinate and thiamine, are obviously not crucial, and most of the “obligate” symbionts here lack their synthesis capacity. The retention of a thiamine transporter indicates that they can scavenge this vitamin either from the host or from other bacteria, as suggested in several previous studies (18, 21, 67).

  • Universal pathways: riboflavin and lipoic acid are synthesized by all included strains. This omnipresence makes it difficult to decide if these vitamins are required by hosts or their synthesis is essential for Arsenophonus cells.

  • Majority pathways: the biotin, folate, and pyridoxal pathways are present in most of the strains. Folate is missing only in the considerably degenerated strain from Aleurodicus dispersus but present in all obligate symbionts. Similarly, pyridoxal is missing only in the degenerated strain from Ceratovacuna japonica. It is also seemingly incomplete in Riesia pediculicola and Riesia pediculischaeffi. However, the missing gene responsible for the terminal conversion of pyridoxine phosphate to pyridoxal phosphate is present in the Pediculus host genome (KEGG code Phum_PHUM170130). Biotin pathways are incomplete in several nascent symbionts and possibly in Riesia phthiripubis.

  • Essential pathway: pantothenate is the only vitamin whose importance for the Arsenophonus-Hippoboscidae symbiosis is strongly supported by its complex distribution pattern. It is absent in all nascent symbionts but present in all of the Hippoboscidae-associated obligate symbionts for which metagenomic data are available (in several cases coded on plasmids). In A. lipoptenae and A. melophagi, the pantothenate pathway is not encoded by the available genomes. In our previous work (22, 68), we did not find plasmids for these two Arsenophonus strains. However, we reported that apart from this obligate symbiont, the host Melophagus ovinus harbors two additional symbionts, Sodalis and Bartonella, both capable of pantothenate synthesis (21). Unfortunately, plasmid absence could not be verified in this study due to the unavailability of the raw data. An interesting situation was encountered in the three Riesia strains. All three pantothenate genes are annotated on the R. pediculicola USDA plasmid. On the basis of the PROKKA annotation and BlastKOALA assignments, the plasmids of the other two strains carry only panB and panC, while they lack panE. The DNA sequence corresponding to this locus is present in both plasmids, but it is highly modified and therefore is not recognized as a functional panE homolog (Fig. 6). Compared to B vitamins, the pathways for amino acid synthesis are greatly deteriorated in all “obligate” symbionts but mostly preserved in the “nascent” strains (Table S2).

FIG 6.

FIG 6

Arrangement of pantothenate genes. Gray denotes other genes not involved in the pantothenate pathway. h, hypothetical protein; na, not annotated (not recognized as a gene) by PROKKA.

Conclusion

This study utilized Hippoboscidae-associated strains of Arsenophonus as a convenient system for studying adaptation of bacteria to mutualistic relationship with obligate blood feeders. In direct relation to the model used, the study produced several concrete findings. For example, the evolutionary history of the Hippoboscidae-Arsenophonus association is a mixture of two processes. On the one hand, these associations resulted from multiple de novo establishments of symbiosis; on the other hand, in some clusters, they experience a period of coevolution with the hosts. The strains that adapted to obligate mutualism, while originating from different phylogenetic lineages, display notable metabolic convergence, particularly in B vitamin pathways.

At the general level, in relation to reconstructions of metabolic roles in the obligate blood feeders, it revealed several weaknesses/problems and also the advantages of such a comparative approach. The first weakness derives from an uncertainty when determining and annotating the genes. When comparing our annotations with other published studies, we occasionally found differences, usually related to highly modified genes (e.g., panE in two species of Riesia [18, 25]). We have observed similar discrepancies even within our study between the functional annotations obtained by different programs (e.g., PROKKA vs BlastKOALA vs Pseudofinder). This problem is particularly significant in strains with strongly degenerated genomes, where the genes could be shortened, dramatically changed in nucleotide composition, or pseudogenized. Another weakness stems from our incomplete knowledge of the structure of the pathways. It has been shown in various studies that the functional flexibility of some pathways may be higher than expected. For example, some missing enzymes may not hamper the pathway functionality (69 71); others may be replaced with different enzymes (72, 73) sometimes with so-called promiscuous enzymes (74). In contrast, the advantage of a comparative study (apart from providing background for evolutionary interpretation) lies in the indication of possible annotation artifacts, which would be difficult to identify based on single-genome analysis. In our data, we have observed complete pathways lacking the same single enzyme in several taxa (Table S2, numbers highlighted by orange), e.g., tenI (row 6 in Table S2), yigB (row 14), epd (row 19), and nudB (row 48). For all these genes, it has been previously demonstrated that they are either not essential for the pathway functionality or can be replaced with different enzymes (70, 73, 75, 76). This strongly suggests that the pathway is conserved by selection with the exception of the missing gene. Such indicated genes and pathways can then be analyzed in more detail, which will lead to better understanding of the degenerative processes during the adaptation to obligate symbiosis.

ACKNOWLEDGMENTS

Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum provided under the program “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042) is greatly appreciated.

We thank Greg D.D. Hurst for his valuable comments and edits to the manuscript.

This work was supported by the Grant Agency of the Czech Republic (grant number 20-07674S to V.H.).

Contributor Information

Václav Hypša, Email: vacatko@prf.jcu.cz.

Jack A. Gilbert, University of California San Diego, La Jolla, California, USA

DATA AVAILABILITY

The genome assemblies of Arsenophonus symbionts are available from GenBank under BioProject number PRJNA949118 (individual accessions are provided in Table 1). The mitochondrial genomes of corresponding Hippobocidae are available from GenBank under accession number OR348033-OR348041. The alignments (in fasta format), phylogenetic trees (in newick format), and annotations of Arsenophonus genomes (in gbk format) are deposited in Mendeley Data.

SUPPLEMENTAL MATERIAL

The following material is available online at https://doi.org/10.1128/msystems.00706-23.

Supplemental figures. msystems.00706-23-s0001.pdf.

Fig. S1 to S3.

DOI: 10.1128/msystems.00706-23.SuF1
Table S1. msystems.00706-23-s0002.xlsx.

Metadata (accession numbers, lists of genes).

DOI: 10.1128/msystems.00706-23.SuF2
Table S2. msystems.00706-23-s0003.xlsx.

Metabolic capacities of the Arsenophonus strains.

DOI: 10.1128/msystems.00706-23.SuF3

ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

REFERENCES

  • 1. Baumann P. 2005. Biology of bacteriocyte-associated endosymbionts of plant sap-sucking insects. Annu Rev Microbiol 59:155–189. doi: 10.1146/annurev.micro.59.030804.121041 [DOI] [PubMed] [Google Scholar]
  • 2. Moran NA. 1996. Accelerated evolution and Muller’s rachet in endosymbiotic bacteria. Proc Natl Acad Sci U S A 93:2873–2878. doi: 10.1073/pnas.93.7.2873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Bennett GM, Moran NA. 2015. Heritable Symbiosis: the advantages and perils of an evolutionary rabbit hole. Proc Natl Acad Sci U S A 112:10169–10176. doi: 10.1073/pnas.1421388112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Nováková E, Hypša V, Moran NA. 2009. Arsenophonus, an emerging clade of intracellular symbionts with a broad host distribution. BMC Microbiol 9:143. doi: 10.1186/1471-2180-9-143 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Hypša V, Dale C. 1997. In vitro culture and phylogenetic analysis of "Candidatus Arsenophonus Triatominarum," an intracellular bacterium from the triatomine bug, Triatoma Infestans. Int J Syst Bacteriol 47:1140–1144. doi: 10.1099/00207713-47-4-1140 [DOI] [PubMed] [Google Scholar]
  • 6. Šorfová P, Škeříková A, Hypša V. 2008. An effect of 16S rRNA Intercistronic variability on coevolutionary analysis in symbiotic bacteria: molecular phylogeny of Arsenophonus triatominarum. Syst Appl Microbiol 31:88–100. doi: 10.1016/j.syapm.2008.02.004 [DOI] [PubMed] [Google Scholar]
  • 7. Rodríguez-Ruano SM, Škochová V, Rego ROM, Schmidt JO, Roachell W, Hypša V, Nováková E. 2018. Microbiomes of North American triatominae: the grounds for chagas disease epidemiology. Front Microbiol 9:1167. doi: 10.3389/fmicb.2018.01167 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Hypša V. 1993. Endocytobionts of Triatoma Infestans – distribution and transmission. J Invertebr Pathol 61:32–38. doi: 10.1006/jipa.1993.1006 [DOI] [Google Scholar]
  • 9. Sasaki-Fukatsu K, Koga R, Nikoh N, Yoshizawa K, Kasai S, Mihara M, Kobayashi M, Tomita T, Fukatsu T. 2006. Symbiotic bacteria associated with stomach discs of human lice. Appl Environ Microbiol 72:7349–7352. doi: 10.1128/AEM.01429-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Allen JM, Reed DL, Perotti MA, Braig HR. 2007. Evolutionary relationships of "Candidatus Riesia spp.," endosymbiotic enterobacteriaceae living within hematophagous primate lice. Appl Environ Microbiol 73:1659–1664. doi: 10.1128/AEM.01877-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Šochová E, Husník F, Nováková E, Halajian A, Hypša V. 2017. Arsenophonus and Sodalis replacements shape evolution of symbiosis in louse flies. Peerj 5. doi: 10.7717/peerj.4099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Kutty SN, Pape T, Wiegmann BM, Meier R. 2010. Molecular phylogeny of the calyptratae (Diptera: Cyclorrhapha) with an emphasis on the superfamily Oestroidea and the position of Mystacinobiidae and McAlpine’s fly. Syst Entomol 35:614–635. doi: 10.1111/j.1365-3113.2010.00536.x [DOI] [Google Scholar]
  • 13. Lehane MJ. 2005. The biology of blood-sucking in insects. 2nd ed. Cambridge University Press, Cambridge. doi: 10.1017/CBO9780511610493 [DOI] [Google Scholar]
  • 14. Douglas AE. 2017. The B vitamin nutrition of insects: the contributions of diet, microbiome and horizontally acquired genes. Curr Opin Insect Sci 23:65–69. doi: 10.1016/j.cois.2017.07.012 [DOI] [PubMed] [Google Scholar]
  • 15. Nogge G. 1978. Aposymbiotic tsetse flies, Glossina Morsitans Moritans ontained by feeding on rabbits immunized specifically with symbionts. J Insect Physiol 24:299–304. doi: 10.1016/0022-1910(78)90026-4 [DOI] [PubMed] [Google Scholar]
  • 16. Baines S. 1956. The role of the symbiotic bacteria in the nutrition of Rhodnius prolixus (Hemiptera). J Exp Biol 33:533–541. doi: 10.1242/jeb.33.3.533 [DOI] [Google Scholar]
  • 17. Puchta O. 1955. Experimental investigations on the significance of symblotio organisms in P. humanus var. corporis. Zeitschrift fur Parasitenkunde 17:1–40. doi: 10.1007/BF00260226 [DOI] [PubMed] [Google Scholar]
  • 18. Boyd BM, Allen JM, de Crécy-Lagard V, Reed DL. 2014. Genome sequence of Candidatus Riesia pediculischaeffi, endosymbiont of chimpanzee lice, and genomic comparison of recently acquired endosymbionts from human and chimpanzee lice. G3 (Bethesda) 4:2189–2195. doi: 10.1534/g3.114.012567 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Říhová J, Nováková E, Husník F, Hypša V. 2017. Legionella becoming a Mutualist: adaptive processes shaping the genome of Symbiont in the louse Polyplax serrata. Genome Biol Evol 9:2946–2957. doi: 10.1093/gbe/evx217 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Rio RVM, Symula RE, Wang JW, Lohs C, Wu YN, Snyder AK, Bjornson RD, Oshima K, Biehl BS, Perna NT, Hattori M, Aksoy S. 2012. Insight into the transmission biology and species-specific functional capabilities of tsetse (Diptera: glossinidae) obligate symbiont Wigglesworthia. Mbio 3:e00240-11. doi: 10.1128/mBio.00240-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Nováková E, Husník F, Šochová E, Hypša V. 2015. Arsenophonus and Sodalis symbionts in louse flies: an analogy to the Wigglesworthia and Sodalis system in tsetse flies. Appl Environ Microbiol 81:6189–6199. doi: 10.1128/AEM.01487-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Nováková E, Hypša V, Nguyen P, Husník F, Darby AC. 2016. Genome sequence of Candidatus Arsenophonus lipopteni, the exclusive symbiont of a blood sucking fly Lipoptena cervi (Diptera: Hippoboscidae). Stand Genomic Sci 11:72. doi: 10.1186/s40793-016-0195-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Říhová J, Bell KC, Nováková E, Hypša V. 2022. Lightella neohaematopini: a new lineage of highly reduced endosymbionts Coevolving with chipmunk lice of the genus neohaematopinus Front Microbiol 13:900312. doi: 10.3389/fmicb.2022.900312 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Říhová J, Batani G, Rodríguez-Ruano SM, Martinů J, Vácha F, Nováková E, Hypša V. 2021. A new symbiotic lineage related to Neisseria and Snodgrassella arises from the dynamic and diverse microbiomes in sucking lice. Mol Ecol 30:2178–2196. doi: 10.1111/mec.15866 [DOI] [PubMed] [Google Scholar]
  • 25. Boyd BM, Allen JM, Nguyen N-P, Vachaspati P, Quicksall ZS, Warnow T, Mugisha L, Johnson KP, Reed DL. 2017. Primates, lice and bacteria: speciation and genome evolution in the Symbionts of hominid lice. Mol Biol Evol 34:1743–1757. doi: 10.1093/molbev/msx117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Andrews S. 2010. Fastqc: A quality control tool for high throughput sequence data. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
  • 27. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. Spades: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, Thierer T, Ashton B, Meintjes P, Drummond A. 2012. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28:1647–1649. doi: 10.1093/bioinformatics/bts199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. 2018. Nanopack: visualizing and processing long-read sequencing data. Bioinformatics 34:2666–2669. doi: 10.1093/bioinformatics/bty149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27:722–736. doi: 10.1101/gr.215087.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Li H. 2018. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. doi: 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Vaser R, Sović I, Nagarajan N, Šikić M. 2017. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27:737–746. doi: 10.1101/gr.214270.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. doi: 10.1093/bioinformatics/btu153 [DOI] [PubMed] [Google Scholar]
  • 34. Emms DM, Kelly S. 2019. Orthofinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238. doi: 10.1186/s13059-019-1832-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066. doi: 10.1093/nar/gkf436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Talavera G, Castresana J. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577. doi: 10.1080/10635150701472164 [DOI] [PubMed] [Google Scholar]
  • 37. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321. doi: 10.1093/sysbio/syq010 [DOI] [PubMed] [Google Scholar]
  • 38. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R, Teeling E. 2020. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534. doi: 10.1093/molbev/msaa015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Lartillot N, Rodrigue N, Stubbs D, Richer J. 2013. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst Biol 62:611–615. doi: 10.1093/sysbio/syt022 [DOI] [PubMed] [Google Scholar]
  • 40. Lefort V, Longueville J-E, Gascuel O. 2017. SMS: smart model selection in PhyML. Mol Biol Evol 34:2422–2424. doi: 10.1093/molbev/msx149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Lartillot N, Brinkmann H, Philippe H. 2007. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol 7:S4. doi: 10.1186/1471-2148-7-S1-S4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Husník F, Chrudimský T, Hypša V. 2011. Multiple origins of endosymbiosis within the enterobacteriaceae (gamma-proteobacteria): convergence of complex phylogenetic approaches. BMC Biol. 9:87. doi: 10.1186/1741-7007-9-87 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, Pütz J, Middendorf M, Stadler PF. 2013. MITOS: improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol 69:313–319. doi: 10.1016/j.ympev.2012.08.023 [DOI] [PubMed] [Google Scholar]
  • 44. Arndt D, Grant JR, Marcu A, Sajed T, Pon A, Liang Y, Wishart DS. 2016. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res 44:W16–W21. doi: 10.1093/nar/gkw387 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Syberg-Olsen MJ, Garber AI, Keeling PJ, McCutcheon JP, Husnik F. 2022. Pseudofinder: detection of pseudogenes in prokaryotic genomes. Mol Biol Evol 39:msac153. doi: 10.1093/molbev/msac153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Darling AE, Mau B, Perna NT. 2010. progressiveMauve: multiple genome alignment with gene gain. PLoS One 5:e11147. doi: 10.1371/journal.pone.0011147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Gilchrist CLM, Chooi Y-H. 2021. Clinker & clustermap.Js: automatic generation of gene cluster comparison figures. Bioinformatics 37:2473–2475. doi: 10.1093/bioinformatics/btab007 [DOI] [PubMed] [Google Scholar]
  • 48. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44:D457–62. doi: 10.1093/nar/gkv1070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Liu C, Cui Y, Li X, Yao M. 2021. Microeco: an R package for data mining in microbial community ecology. FEMS Microbiol Ecol 97:fiaa255. doi: 10.1093/femsec/fiaa255 [DOI] [PubMed] [Google Scholar]
  • 50. Ito K, Murphy D. 2013. Application of ggplot2 to pharmacometric graphics. CPT Pharmacometrics Syst Pharmacol 2:e79. doi: 10.1038/psp.2013.56 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Bergsten J. 2005. A review of long-branch attraction. Cladistics 21:163–193. doi: 10.1111/j.1096-0031.2005.00059.x [DOI] [PubMed] [Google Scholar]
  • 52. Balmand S, Lohs C, Aksoy S, Heddi A. 2013. Tissue distribution and transmission routes for the tsetse fly endosymbionts. J Invertebr Pathol 112 Suppl:S116–22. doi: 10.1016/j.jip.2012.04.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Tamas I, Klasson L, Canbäck B, Näslund AK, Eriksson A-S, Wernegreen JJ, Sandström JP, Moran NA, Andersson SGE. 2002. 50 million years of genomic stasis in endosymbiotic bacteria. Science 296:2376–2379. doi: 10.1126/science.1071278 [DOI] [PubMed] [Google Scholar]
  • 54. Wernegreen JJ. 2002. Genome evolution in bacterial endosymbionts of insects. Nat Rev Genet 3:850–861. doi: 10.1038/nrg931 [DOI] [PubMed] [Google Scholar]
  • 55. González-Domenech CM, Belda E, Patiño-Navarrete R, Moya A, Peretó J, Latorre A. 2012. Metabolic stasis in an ancient symbiosis: genome-scale metabolic networks from two blattabacterium cuenoti strains, primary endosymbionts of cockroaches. BMC Microbiol 12 Suppl 1:S5. doi: 10.1186/1471-2180-12-S1-S5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Sloan DB, Moran NA. 2013. The evolution of genomic instability in the obligate endosymbionts of whiteflies. Genome Biol Evol 5:783–793. doi: 10.1093/gbe/evt044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Soucy SM, Huang J, Gogarten JP. 2015. Horizontal gene transfer: building the web of life. Nat Rev Genet 16:472–482. doi: 10.1038/nrg3962 [DOI] [PubMed] [Google Scholar]
  • 58. Moran NA. 2003. Tracing the evolution of gene loss in obligate bacterial symbionts. Curr Opin Microbiol 6:512–518. doi: 10.1016/j.mib.2003.08.001 [DOI] [PubMed] [Google Scholar]
  • 59. Martinez-Gomez NC, Palmer LD, Vivas E, Roach PL, Downs DM. 2011. The Rhodanese domain of thiI is both necessary and sufficient for synthesis of the thiazole moiety of thiamine in Salmonella enterica. J Bacteriol 193:4582–4587. doi: 10.1128/JB.05325-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Marchler GH, Song JS, Thanki N, Yamashita RA, Yang M, Zhang D, Zheng C, Lanczycki CJ, Marchler-Bauer A. 2020. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res 48:D265–D268. doi: 10.1093/nar/gkz991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Snyder AK, McLain C, Rio RVM. 2012. The Tsetse fly obligate mutualist Wigglesworthia morsitans alters gene expression and population density via exogenous nutrient provisioning. Appl Environ Microbiol 78:7792–7797. doi: 10.1128/AEM.02052-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Kenyon LJ, Meulia T, Sabree ZL. 2015. Habitat visualization and genomic analysis of "Candidatus Pantoea carbekii," the primary symbiont of the brown marmorated stink bug. Genome Biol Evol 7:620–635. doi: 10.1093/gbe/evv006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Gilliland CA, Patel V, McCormick AC, Mackett BM, Vogel KJ. 2023. Using axenic and gnotobiotic insects to examine the role of different microbes on the development and reproduction of the kissing bug Rhodnius prolixus (Hemiptera: Reduviidae). Mol Ecol 32:920–935. doi: 10.1111/mec.16800 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Santos-Garcia D, Juravel K, Freilich S, Zchori-Fein E, Latorre A, Moya A, Morin S, Silva FJ. 2018. To B or not to B: comparative genomics suggests Arsenophonus as a source of B vitamins in whiteflies. Front Microbiol 9:2254. doi: 10.3389/fmicb.2018.02254 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Husnik F, Hypsa V, Darby A, Cordaux RT. 2020. Insect-symbiont gene expression in the midgut bacteriocytes of a blood-sucking parasite. Genome Biol and Evol 12:429–442. doi: 10.1093/gbe/evaa032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Kirkness EF, Haas BJ, Sun W, Braig HR, Perotti MA, Clark JM, Lee SH, Robertson HM, Kennedy RC, Elhaik E, Gerlach D, Kriventseva EV, Elsik CG, Graur D, Hill CA, Veenstra JA, Walenz B, Tubío JMC, Ribeiro JMC, Rozas J, Johnston JS, Reese JT, Popadic A, Tojo M, Raoult D, Reed DL, Tomoyasu Y, Kraus E, Mittapalli O, Margam VM, Li H-M, Meyer JM, Johnson RM, Romero-Severson J, Vanzee JP, Alvarez-Ponce D, Vieira FG, Aguadé M, Guirao-Rico S, Anzola JM, Yoon KS, Strycharz JP, Unger MF, Christley S, Lobo NF, Seufferheld MJ, Wang N, Dasch GA, Struchiner CJ, Madey G, Hannick LI, Bidwell S, Joardar V, Caler E, Shao R, Barker SC, Cameron S, Bruggner RV, Regier A, Johnson J, Viswanathan L, Utterback TR, Sutton GG, Lawson D, Waterhouse RM, Venter JC, Strausberg RL, Berenbaum MR, Collins FH, Zdobnov EM, Pittendrigh BR. 2010. Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. Proc Natl Acad Sci U S A 107:12168–12173. doi: 10.1073/pnas.1003379107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Husník F. 2018. Host-symbiont-pathogen interactions in blood-feeding parasites: nutrition, immune cross-talk and gene exchange. Parasitology 145:1294–1303. doi: 10.1017/S0031182018000574 [DOI] [PubMed] [Google Scholar]
  • 68. Chrudimský T, Husník F, Nováková E, Hypša V. 2012. Candidatus Sodalis melophagi sp nov.: phylogenetically independent comparative model to the tsetse fly symbiont Sodalis glossinidius. PLoS One 7:e40354. doi: 10.1371/journal.pone.0040354 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H. 2006. Construction of Escherichia coli K-12 in-frame, single-gene knockout Mutants: the keio collection. Mol Syst Biol 2:2006. doi: 10.1038/msb4100050 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Haase I, Sarge S, Illarionov B, Laudert D, Hohmann H-P, Bacher A, Fischer M. 2013. Enzymes from the haloacid dehalogenase (HAD) superfamily catalyse the elusive dephosphorylation step of riboflavin biosynthesis. Chembiochem 14:2272–2275. doi: 10.1002/cbic.201300544 [DOI] [PubMed] [Google Scholar]
  • 71. Liu S, Hu W, Wang Z, Chen T. 2020. Production of riboflavin and related cofactors by biotechnological processes. Microb Cell Fact 19:31. doi: 10.1186/s12934-020-01302-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Richts B, Commichau FM. 2021. Underground metabolism facilitates the evolution of novel pathways for vitamin B6 biosynthesis. Appl Microbiol Biotechnol 105:2297–2305. doi: 10.1007/s00253-021-11199-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Tramonti A, Nardella C, de Salvo ML, Barile A, D’Alessio F, de Crécy-Lagard V, Contestabile R. 2021. Knowns and unknowns of vitamin B(6) metabolism in Escherichia Coli. EcoSal Plus 9:1128. doi: 10.1128/ecosalplus.ESP-0004-2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Price DRG, Wilson ACC. 2014. A substrate ambiguous enzyme facilitates genome reduction in an intracellular symbiont. BMC Biol. 12:110. doi: 10.1186/s12915-014-0110-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Hazra AB, Han Y, Chatterjee A, Zhang Y, Lai R-Y, Ealick SE, Begley TP. 2011. A missing enzyme in thiamin thiazole biosynthesis: identification of Teni as a thiazole tautomerase. J Am Chem Soc 133:9311–9319. doi: 10.1021/ja1110514 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Li K, Li T, Yang S-S, Wang X-D, Gao L-X, Wang R-Q, Gu J, Zhang X-E, Deng J-Y. 2017. Deletion of nudB causes increased susceptibility to antifolates in Escherichia coli and Salmonella enterica. Antimicrob Agents Chemother 61:e02378-16. doi: 10.1128/AAC.02378-16 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental figures. msystems.00706-23-s0001.pdf.

Fig. S1 to S3.

DOI: 10.1128/msystems.00706-23.SuF1
Table S1. msystems.00706-23-s0002.xlsx.

Metadata (accession numbers, lists of genes).

DOI: 10.1128/msystems.00706-23.SuF2
Table S2. msystems.00706-23-s0003.xlsx.

Metabolic capacities of the Arsenophonus strains.

DOI: 10.1128/msystems.00706-23.SuF3

Data Availability Statement

The genome assemblies of Arsenophonus symbionts are available from GenBank under BioProject number PRJNA949118 (individual accessions are provided in Table 1). The mitochondrial genomes of corresponding Hippobocidae are available from GenBank under accession number OR348033-OR348041. The alignments (in fasta format), phylogenetic trees (in newick format), and annotations of Arsenophonus genomes (in gbk format) are deposited in Mendeley Data.


Articles from mSystems are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES