Skip to main content
Genome Biology and Evolution logoLink to Genome Biology and Evolution
. 2022 Jun 22;14(7):evac098. doi: 10.1093/gbe/evac098

Pangenome Evolution in Environmentally Transmitted Symbionts of Deep-Sea Mussels Is Governed by Vertical Inheritance

Devani Romero Picazo 1, Almut Werner 2, Tal Dagan 3, Anne Kupczok 4,5,6,
Editor: Esperanza Martinez-Romero
PMCID: PMC9260185  PMID: 35731940

Abstract

Microbial pangenomes vary across species; their size and structure are determined by genetic diversity within the population and by gene loss and horizontal gene transfer (HGT). Many bacteria are associated with eukaryotic hosts where the host colonization dynamics may impact bacterial genome evolution. Host-associated lifestyle has been recognized as a barrier to HGT in parentally transmitted bacteria. However, pangenome evolution of environmentally acquired symbionts remains understudied, often due to limitations in symbiont cultivation. Using high-resolution metagenomics, here we study pangenome evolution of two co-occurring endosymbionts inhabiting Bathymodiolus brooksi mussels from a single cold seep. The symbionts, sulfur-oxidizing (SOX) and methane-oxidizing (MOX) gamma-proteobacteria, are environmentally acquired at an early developmental stage and individual mussels may harbor multiple strains of each symbiont species. We found differences in the accessory gene content of both symbionts across individual mussels, which are reflected by differences in symbiont strain composition. Compared with core genes, accessory genes are enriched in genome plasticity functions. We found no evidence for recent HGT between both symbionts. A comparison between the symbiont pangenomes revealed that the MOX population is less diverged and contains fewer accessory genes, supporting that the MOX association with B. brooksi is more recent in comparison to that of SOX. Our results show that the pangenomes of both symbionts evolved mainly by vertical inheritance. We conclude that genome evolution of environmentally transmitted symbionts that associate with individual hosts over their lifetime is affected by a narrow symbiosis where the frequency of HGT is constrained.

Keywords: pangenome, high-resolution metagenomics, horizontal gene transfer, accessory genome


Significance.

Bacteria can acquire genes from other bacteria via horizontal transfer. Thus, individuals of the same species can vary in their gene content. Environmental bacteria have huge pangenomes, that is, the total number of genes in a species. However, many bacteria are associated with eukaryotic hosts, such as symbionts with little contact to other bacteria, and their pangenomes are not well investigated. Here we analyze the pangenomes of deep-sea mussel symbionts. These symbionts are taken up anew from the environment in each mussel generation, which connects them potentially to environmental populations. Surprisingly, we observe rather small pangenomes and not much horizontal gene transfer. We conclude that these symbionts mainly live in the host environment, with limited access to horizontally transferred genes.

Introduction

Bacterial populations can show enormous genomic diversity, which comprises nucleotide differences between homologous sequences and variation in the accessory gene content. In particular, gene content diversity is described by the species pangenome, which consists of all the genomic sequences present across individuals of a bacterial species. The core genes in a pangenome are present in each individual while the remaining genes are considered accessory (Brockhurst et al. 2019). Pangenome size and structure vary across bacterial species (Maistrenko et al. 2020), and pangenomic diversity is important for bacterial adaptation in environmental species, where accessory genes are often niche-specific (Kashtan et al. 2014; Liao et al. 2021; Conrad et al. 2022). To understand microbial adaptation, it is thus crucial to understand the evolutionary processes that shape pangenome diversity. The main processes that give rise to microbial pangenomes are gene duplication and loss during vertical inheritance and gene acquisition via horizontal gene transfer (HGT). HGT enables the transfer of genetic material between microbial individuals that are not related by inheritance (Hall et al. 2017) and is particularly relevant for the evolution of microbial pangenomes (Treangen and Rocha 2011; Tria and Martin 2021). Some mechanisms of HGT involve the activity of mobile genetic elements (MGEs)—such as phages, plasmids, transposons, or genomic islands—for transferring genetic material between different DNA strands.

Many bacterial species are known to be symbionts, that is, they are strictly or facultatively associated with eukaryotic hosts. Symbionts can have different modes of transmission; parentally (also called vertically) transmitted bacteria are transferred from adults to their progeny, while environmentally (also called horizontally) transmitted bacteria are acquired from the environment, either from a free-living population or other hosts and mixed transmission modes are also common over long evolutionary time (Bright and Bulgheresi 2010; Russell 2019). We here prefer the terms environmental and parental symbiont transmission to clearly distinguish these processes from HGT and vertical inheritance. The host association has important implications for the adaptation of symbionts via HGT since bacterial populations that share a habitat may be able to access the habitat-specific gene pool by HGT (Bordenstein and Reznikoff 2005; Newton and Bordenstein 2011; Polz et al. 2013). Indeed, several studies demonstrated that gene transfer from locally adapted populations may facilitate host colonization. For example, in plant-associated communities, MGEs enabled the adaptation of locally adapted nitrogen-fixing soil bacteria to associate with novel crops during their domestication (Greenlon et al. 2019), and in sponges, diverse functions that potentially provide a selective advantage to the symbionts in that niche were acquired by HGT (Robbins et al. 2021). Notably, these examples stem from environmentally transmitted symbionts that might have a wider potential for HGT compared with parentally transmitted symbionts. First, infection of a host by multiple symbionts results in a shared environment, where the chances for HGT are higher, and second, genes can potentially be acquired from environmental bacteria during the free-living stage. Indeed, very few HGT events have been reported in well-studied insect symbioses, potentially due to genetic isolation linked to the intracellular lifestyle and parental transmission (Pinto-Carbó et al. 2016; López-Madrigal and Gil 2017; Waterworth et al. 2020).

Previous studies showed that the evolution of endosymbiont genomes is characterized by rare HGT and fewer accessory genes compared with environmental bacteria (Kloesges et al. 2011; Brockhurst et al. 2019). This conclusion was drawn based on a few model symbionts that have been cultivated and sequenced and that are mostly parentally transmitted. Populations of these parentally transmitted insect symbionts are characterized by a low intra-host diversity (Guyomar et al. 2018). However, less is known about environmentally transmitted symbionts, where multiple strains might colonize an individual host, resulting in within-host strain diversity. Our view of microbial diversity has been revolutionized in the last 20 years by cultivation-independent approaches, such as metagenomics (e.g., Giovannoni et al. 2014; Castelle and Banfield 2018). Additionally, since metagenomics enables us to assess the variation of all organisms in a particular environment, deeply sequenced metagenomes provide adequate datasets for studying variation within microbial populations (Denef 2019; Rossum et al. 2020). This approach revealed abundant within-host diversity of symbiont populations, e.g., in the gut microbiome of humans and bees (Ellegaard and Engel 2019; Garud et al. 2019).

The presence of strain diversity has recently been reported for environmentally transmitted symbionts that reside in Bathymodiolus mussels, where they are hosted in bacteriocytes within the gill epithelium and provide the mussel with nutrition (Won et al. 2003). After an aposymbiotic larvae stage, the symbionts are acquired rapidly during the mussels’ metamorphosis from a planktonic to a benthic lifestyle, which is associated with morphological changes in the mussel’s epithelial tissue (Franke et al. 2021). As adults, mussel gills constantly develop new filaments that are continuously uptaking symbionts, where older filaments of the same mussel might contribute substantially to the source of the colonization (Wentrup et al. 2014; Romero Picazo et al. 2019). It still remains unclear if the symbionts have an active free-living stage or whether they might be dormant and only replicate within the mussel (Ikuta et al. 2016; Laming et al. 2018). Bathymodiolus can be infected by two chemosynthetic symbiont species, sulfur-oxidizing (SOX) and methane-oxidizing (MOX) gamma-proteobacteria. Although most Bathymodiolus species harbor only a single 16S phylotype for each symbiont, metagenomic analyses of multiple Bathymodiolus species showed that different SOX and MOX strains can be present within an individual mussel (Ansorge et al. 2019; Romero Picazo et al. 2019). An important role of MGEs and HGT in the evolution of SOX symbiont genomes from hydrothermal vents at the mid-atlantic ridge has been suggested. Compared with free-living relatives, SOX genomes were found to contain high numbers of transposases, integrases, restriction-modification systems, and toxin-related genes, where the latter are also linked to MGEs (Sayavedra et al. 2015). In addition, it has been observed that co-occurring SOX strains from these sites differ in the content of genes involved in energy and nutrient utilization and viral defense mechanisms (Ansorge et al. 2019).

To analyze how symbiont strain diversity varies across mussels, we have analyzed closely related, nearby hosts. Nineteen Bathymodiolus brooksi mussels were sampled from a single location at a cold seep site in the northern Gulf of Mexico. Since Bathymodiolus symbionts cannot be cultured, high-resolution metagenomics data was collected by deeply sequencing homogenized gill tissue of each mussel (Romero Picazo et al. 2019). We previously obtained single-sample assemblies and used a gene-based binning approach to reconstruct the core genomes of SOX and MOX (fig. 1). Based on single nucleotide variants (SNVs) within the core genes, reconstruction of core-genome-wide strains revealed eleven SOX strains that group into four clades, and six MOX strains that group into two clades (Romero Picazo et al. 2019). Mussel individuals may harbor one or multiple strains of each species. In particular, they can contain strains from one to three different SOX clades and from one or both MOX clades (Fig. 1 in [Romero Picazo et al. 2019]). We found a high variability of the nucleotide diversity π among samples, where samples with low π tend to have a lower strain diversity as estimated using α-diversity. We furthermore investigated genetic isolation using the fixation index FST, which revealed generally high genetic isolation between samples and also clusters of low genetic isolation where samples show highly similar SNV states and frequencies, that is, they contain similar populations for a particular symbiont. These clusters were also detected using the ecological measure β-diversity, thus, they are also similar in strain composition and frequencies, and they are not related to the genetics of these closely related hosts. Taken together, we found that the evolution of symbiont populations in individual mussels is characterized by genetic isolation, suggesting that symbionts are only taken up at an early stage in the mussel life cycle and are then confined to one mussel, resulting in geographic isolation (Romero Picazo et al. 2019).

Fig. 1.

Fig. 1.

High-resolution metagenomic analysis workflow. Arrows represent open reading frames (ORFs) inferred from the assemblies. Orange part: Core genome analysis as described before (Romero Picazo et al. 2019). The core genome analysis included the reconstruction of core genome strain sequences and estimation of population structure measured as both, β-diversity and FST. Note that all SNVs are included in the FST calculation, whereas β-diversity is based on the strain composition and not all SNVs can be linked to strains by DESMAN. Nonredundant gene catalog (NRGC) comprises gene cluster representatives obtained by grouping highly similar genes across samples. Red part: Pangenome analysis presented in this paper. The analysis is shown for a single species for simplicity. The network approach allows reconstructing population pangenomes. For each sample, the complete set of contigs containing genes from the species pangenome corresponds to the reconstructed MAG (metagenome-assembled genome). Using mapping, we estimated the coverages for all genes in the pangenomes in each sample, also including genes that have not been reconstructed on the contigs of that sample. The relative abundance of genes in the pangenome is then used to estimate PST. Additionally, we reconstructed the gene content of single strains that are dominant in a sample.

Here, we study the effect of the geographical isolation on SOX and MOX pangenome evolution. To this end, we analyzed the population pangenomes of the SOX and MOX strains residing in 19 mussels sampled from a single location.

Results

A Network Approach for the Recovery of Symbiont Pangenomes from Microbiome Metagenomes

To reconstruct the SOX and MOX pangenomes, we recovered the accessory genomes using a network approach. The core genomes of both symbionts were used as starting points for the expansion of the pangenomes in the network. Here, we identified connections between the symbiont genes, which are located on different metagenomic contigs in each sample (fig. 1). These connected contigs are therefore identified as related to one symbiont and correspond to the reconstructed metagenome-assembled genomes (MAGs). Our pangenome inference revealed clear differences between the two symbionts. The pangenome of the SOX population comprises 2,484 genes with a total length of 2.27 Mbp. Of these, 962 (38.7% of total genes) were identified as accessory, where the majority are single-copy accessory genes (table 1). Each SOX MAG contains between 1,640 and 2,055 genes (average 1,885) with genome lengths that range between 1.72 and 2.21 Mbp (average 2.06 Mbp) (supplementary table S2A, Supplementary Material online). The MOX population pangenome comprises 2,866 genes with a total length of 2.24 Mbp, where 414 (14.5% of total genes) are accessory and the majority of accessory genes are single-copy (table 1). Each MOX MAG contains between 2,480 and 2,603 genes (average 2,546) with genome lengths between 2.39 and 2.50 Mbp (average 2.45 Mbp) (supplementary table S2B, Supplementary Material online). The SOX and MOX core genomes have an average GC content of 38% where the core GC content distributions differ significantly between SOX and MOX (table 1; supplementary fig. S1, Supplementary Material online). In both symbionts, the GC content of the accessory genome is significantly lower than that of the core genome (table 1; supplementary fig. S1, Supplementary Material online).

Table 1.

Description of SOX and MOX pangenomes

Number of genes Total length (bp) Median gene length GC content Median (IQR) GC content per gene Number of SNVs SNVs/kbp Median SNVs/kbp per gene Number of genes with SNVs pN/pS Median pN/pS per gene Median pS per gene
SOX Core 1,522 1,386,003 735 0.374 0.380 (0.0502) 17,835 12.87 11.9 1,332 88% 0.135 0.09454 0.01613
Single-copy accessory 939 863,319 480 0.356 0.331 (0.0808) 6,791 7.866 0 464 49% 0.326 0.2432 0.009459
Multi-copy accessory 23 21,735 765 0.379 789 36.30 28.6 22 96% 0.404 0.2673 0.02461
Total 2,484 2,271,057
MOX Core 2,452 1,961,220 637.5 0.38 0.380 (0.0401) 4,585 2.338 1.63 1,632 67% 0.421 0.2656 0.003208
Single-copy accessory 379 254,967 510 0.329 0.359 (0.0559) 245 0.9609 0 81 21% 0.514 0.2637 0.002921
Multi-copy accessory 35 19,686 402 0.377 496 25.20 17.3 31 89% 0.323 0.4145 0.01460
Total 2,866 2,235,873

Note.—Median pN/pS and median pS estimated among genes with at least one SNV. For the GC content analysis, single-copy and multi-copy accessory genes have been merged (see also supplementary fig. S1, Supplementary Material online). bp, base pair; kbp, kilobase pair; IQR, interquartile range.

To infer the reliability of our approach to reconstruct pangenomes from metagenomes, we investigated assembly statistics depending on the number of strain clades in each sample (supplementary table S2, fig. S3, Supplementary Material online). We observed that higher strain diversity results in longer MAGs and more genes for MOX supporting that the MAGs are population genomes including genes from multiple strains. In contrast, the length and number of genes on SOX MAGs does not increase with strain diversity, which indicates that contig fragmentation leads to missing regions in the MAGs. When mapping the samples on the SOX accessory genome, we detected more genes for samples with higher strain diversity; thus, samples with more strain clades indeed contain more accessory genes. Since the following analyses are based on the reconstructed pangenomes and the frequency of each gene in each sample, our approach provides a less biased assessment of gene presence in each sample compared with using sample-specific MAGs.

Furthermore, to evaluate the sensitivity of the network approach to identify accessory genes, we estimated the recovery rate as the number of new genes that are added to the pangenome for each newly sampled mussel (supplementary fig. S2A and B, Supplementary Material online). We observed that 95% of the total accessory genome is detected when adding five samples for both bacterial species. This shows that our approach has the required sensitivity in order to recover accessory genes in the population pangenomes at this site. Sequencing coverage can further impact the sensitivity of our approach to recover pangenomes from metagenomic data. The two symbionts occur at different abundances within the mussel, which results in differences in the sequencing coverage of SOX and MOX. To investigate whether sequencing coverage might impact the sensitivity to detect accessory genes, we estimated the recovery rate for SOX when downsampling it to the MOX coverage. We found no decrease in the number of detected genes in SOX with this normalization, where also 95% of the total accessory genome is detected with five samples (supplementary fig. S2C, Supplementary Material online), and therefore, the recovery of the accessory genome is comparable with the approach using the full coverage.

By comparing the pangenome characteristics of the two symbionts, we found that the MOX population pangenome contains more genes than SOX, whereas the accessory genome is larger in SOX. We found that the accessory genome sizes are less than 40% of the total pangenome sizes. Notably, this proportion is smaller than for the majority of known species pan-genomes, where 38 of the 43 species pangenomes estimated from RefSeq records (Ding et al. 2018) have an accessory genome fraction above 40% and the five exceptional species are all obligate pathogens (Bordetella pertussis, Brucella mellitensis, Chlamydia trachomatis, Mycobacterium tuberculosis, Mycoplasma pneumoniae). The low proportion of accessory genes in the pangenomes reconstructed here is likely due to the fact that we reconstruct the pangenomes from the sampling site, that is, not that of the entire species, where the species pangenome is expected to be larger. The lower GC content in the accessory genomes of both species may indicate 1) that genes are transferred from an external source with low GC content or that genes with lower GC content are preferably transferred or 2) that accessory and core genes are under different selection regimes, where the higher GC content in the core genome is maintained by purifying selection (Bohlin et al. 2017).

Gene Content Shows Genetic Isolation between Mussels That Is Explained by Strain Composition

The intracellular lifestyle of the symbionts results in strong geographic isolation between mussels and this leads to genetically isolated symbiont populations (Romero Picazo et al. 2019). To study how genetic isolation impacts pangenome evolution in the mussel symbionts, we examined the SOX and MOX gene content variation within and across individual mussels.

To analyze symbiont gene content diversity within individual mussels, we estimated the gene content diversity φ, which is based on the relative frequency of genes in a population or subpopulation (see Materials and Methods). We found that φ is positively associated with the nucleotide diversity π and both measures increase with α-diversity estimated from the strain composition (fig. 2A and B; supplementary fig. S4A and B, Supplementary Material online). We did not observe any difference in φ between SOX and MOX (two-sided Wilcoxon signed-rank test, P-value = 0.86), although π is significantly lower in MOX compared with SOX (two-sided Wilcoxon signed-rank test, P-value = 5.23e−4). The lower coverage of the MOX data could cause difficulties in estimating accurate gene frequencies and result in an overestimation of φ. To investigate the impact of coverage on the φ estimation, we estimated φ for SOX downsampled to the MOX coverage. We found a strong correlation between SOX φ and φ for the downsampled data, where the downsampling resulted in a slight underestimation of φ (supplementary fig. S4C, Supplementary Material online). We therefore conclude that the high values of MOX φ cannot be explained by the lower MOX coverage.

Fig. 2.

Fig. 2.

Relationships between different measures for population diversity and genetic isolation (A) Relationships of gene content diversity (φ) with nucleotide diversity (π) and α-diversity for SOX and MOX. (B) Correlation coefficients and P-values for Pearson’s product-moment correlation. (C) Relationships of pangenome fixation index (PST) with fixation index (FST) and β-diversity for SOX. PST (median 0.3417) is significantly lower than FST (median 0.6216) (two-sided Wilcoxon signed-rank test, P-value < 2.2e−16). (D) Relationships of PST with FST and β-diversity for MOX. PST (median 0.1043) is significantly lower than FST (median 0.4290) (two-sided Wilcoxon signed-rank test, P-value < 2.2e−16).

To study the degree of isolation between mussels, we developed the pangenome fixation index PST, which is based on the gene content diversity φ and derived in analogy to the fixation index (FST) (see Materials and Methods). Small values of FST or PST indicate that the samples stem from the same population, whereas large values indicate that the samples constitute subpopulations. We found that PST measures a lower degree of isolation than FST, which is particularly pronounced in MOX (fig. 2C and D, supplementary fig. S5, Supplementary Material online). Despite PST being lower than FST, both measures are correlated, where the correlation is especially strong for SOX. Furthermore, we observed that for both SOX and MOX, the pairwise correlation is highest when comparing PST and β-diversity. The β-diversity measure is based on the strain relationships and the strain distribution in the mussels (fig. 1). Thus, differences in strain composition are even more strongly correlated to the pangenome fixation index than to the SNV-based fixation index. We have observed before that some pairwise comparisons have high FST but low β-diversity, especially for MOX (Romero Picazo et al. 2019) Notably, all SNVs are considered for the FST estimation, but not all SNVs can be linked to strains by DESMAN, which can result in discrepancies between these two measures. Here we observed that PST is low for these discrepant pairs resulting in the dark blue clouds in the bottom right part of figure 2C and D. Thus, PST reflects β-diversity stronger than it reflects FST. We also observed that samples cluster based on these diversity measures, where the clustering based on PST is very similar to the clustering based on β-diversity for both symbionts (supplementary fig. S4, Supplementary Material online).

Taken together, we derived methods to study population diversity and genetic isolation based on gene content. We find that the measured degree of isolation is stronger when measured at the level of SNVs (i.e., FST) in comparison to gene content (i.e., PST), which might be due to the fact that SNVs and their frequencies can be measured with a higher accuracy. We find that genetic isolation based on gene content is highly associated with β-diversity based on strain composition. This supports that gene content variation is strongly associated with the strain relationships instead of being mobile between strains. Since β-diversity is mostly driven by differences among the strain clades rather than differences among strains within a clade, the strong correlation further suggests that gene content differences are mainly found among strain clades. We thus conclude that gene content differences between mussels are related to the strain compositions within the mussels and especially to their clade compositions.

Strain Clades Are Defined by Gene Content Differences

In the previous section, we compared the gene content within and between individual mussels. Next, we aimed to resolve the accessory gene content of individual strains by identifying the presence of genes in mussels infected by a single strain. We defined strains to be dominant in a mussel when their frequency in a sample is at least 0.7 (supplementary table S2, Supplementary Material online). For a sample with a dominant strain, genes are present in a strain when their coverage is at least 50% of the median core genome coverage in that sample and we termed all these genes to be “assigned to that strain” (supplementary fig. S6, Supplementary Material online). We further merged the strain-assigned genes of all samples with the same dominant strain. Similarly, clade-assigned genes resulted from merging the gene content of strains belonging to the same clade.

For both symbionts, the majority of the accessory genes could be assigned to strains (fig. 3); we assigned 731 genes (76% of the SOX accessory genes) to five SOX strains that are dominant in 12 mussel samples, where some genes were assigned to multiple strains (fig. 3; supplementary table S3A, Supplementary Material online). The clades differ in the number of accessory genes, where clade S2 contains the largest number and clade S1 the smallest number of accessory genes. Additionally, S2 contains 169 genes that cannot be found in any other clade, that is, they are clade-specific genes. In contrast, each of the other clades has at most 71 specific genes. We also identified 67 genes that were assigned to all strains, of which 18 are multi-copy genes. For MOX, we could assign 276 accessory genes (67% of the MOX accessory genes) to three different strains that were dominant in ten different samples (fig. 3; supplementary table S3B, Supplementary Material online). Of these, 65 genes are specific to clade M1, 132 are specific to clade M2, and 77 genes were shared among all three strains, of which 34 are multi-copy. We found that the reconstructed gene content of samples with the same dominant strain overlap to a large extent, which serves as additional support for the robustness of our approach (supplementary table S3, Supplementary Material online). A pangenome analysis of these reconstructed strains, shows that both pangenomes are closed (i.e., α > 1; SOX α = 1.27; MOX α = 1.62), indicating that the pangenome sizes remain constant when more strains are sampled (Tettelin et al. 2008).

Fig. 3.

Fig. 3.

Accessory gene content of reconstructed strains in SOX and MOX. Presence–absence patterns for accessory gene clusters are shown. Gray gene clusters refer to accessory genes that could not be assigned to any strain, whereas colored gene clusters indicate the clade affiliation. Blocks of gene clusters are sorted by size. For interpretation, core strain phylogenies are shown on the left. Based on the concatenated core gene alignment, IQ-TREE 2 using the GTR + G8 model and 1,000 approximate bootstrap replicates resolved these trees with all branches supported >85% (Minh et al. 2020). A network representation of the strain relationships is presented in Fig. 1 of (Romero Picazo et al. 2019).

To gain insights into the pace of gene content evolution, we estimated pairwise strain distances in terms of differentially present genes and of SNVs. We found that strain pairs generally have more SNVs compared with differentially present genes (supplementary fig. S7 and table S4, Supplementary Material online), which supports that gene content changes occur less frequently than substitutions. The positive correlation between the number of differentially present genes and SNVs supports a positive relationship between gene content changes and substitutions, which is compatible with gene content evolution by vertical inheritance.

Here, by extracting genes present in samples with a dominant strain, we could assign a large proportion of accessory genes to SOX and MOX strains. We found a high level of gene sharing among strains within the same clade, which supports that clades are characterized by gene content and that gene content evolves mainly by vertical inheritance.

The Accessory Genome Is Less Diverged Than the Core Genome for Both Symbionts

To investigate the evolution of the accessory gene sequences, we identified SNVs on all genes in the pangenomes. The number of SNVs per kilo base pair (SNVs/kbp) is higher in SOX than in MOX for all gene classes (core, single-copy accessory, and multi-copy accessory) (table 1). For both species, the SNVs/kbp is lower in the accessory genome than in the core genome, where many of the single-copy accessory genes do not have any SNV (supplementary fig. S8A, Supplementary Material online; table 1)—475 genes in SOX (51%) and 298 (79%) in MOX. Furthermore, multi-copy genes have the highest SNVs/kbp (table 1), which suggests that the divergence in multi-copy genes is overestimated due to the inclusion of paralogs.

To study the selection pressure on the pangenomes, we estimated pN/pS for the single-copy genes, where the analysis is restricted to genes with at least one SNV. In MOX, core and accessory genes show a similar pS distribution and also a similar pN/pS distribution (supplementary fig. S8B and C, Supplementary Material online). Thus, although divergence is low in MOX, the selection pressure acting on the core and accessory genome is similar. In contrast, we observed that the pN/pS distribution is shifted to lower values for the SOX core genes compared with SOX accessory genes (table 1; supplementary fig. S8B, Supplementary Material online). This could suggest that the strength of purifying selection is higher on the core genome. However, the pS distribution is shifted to higher values for the SOX core genes compared with the accessory genes (table 1; supplementary fig. S8C, Supplementary Material online). It has been observed that the relative rate of nonsynonymous to synonymous substitutions depends on the divergence of the analyzed species (Rocha et al. 2006; Romero Picazo et al. 2019). Indeed, we found that the joint distribution of pS and pN/pS are largely overlapping for the SOX core and accessory genome, with some accessory genes having a very low pS and a high pN/pS (supplementary fig. S9A, Supplementary Material online). We thus cannot conclude that the strength of selection is different between SOX core and accessory genes.

To rule out the possible impact of differential coverage between core and accessory genes in the pN and pS estimates, we studied the distribution of these measures by only considering SNVs found across those samples containing a dominant strain. In these samples, SNV detection should be less biased by coverage because core and accessory genes have similar coverage (supplementary fig. S6, Supplementary Material online). Indeed, we found that the distributions of pN/pS and pS estimated only from those samples are similar to the distributions previously estimated for the full dataset (supplementary fig. S9, Supplementary Material online), which suggests that variation in coverage does not explain the differences observed between the core and accessory genome.

The frequencies of SNVs/kbp presented here suggest that the accessory gene sequences are less divergent than the core genes in both symbionts. In a scenario where the accessory genes are enriched for mobile genes and have been acquired multiple times independently from diverse organisms, the accessory genome is expected to be very diverse. We thus conclude that this scenario is not realistic. In contrast, the majority of the accessory genome has probably only been acquired once or has been present in the ancestor of the population and was subsequently lost in some strains. Then, accessory genes are found at lower frequency and are thus expected to be less diverged than core genes. Furthermore, the larger number of genes without detected mutations in the accessory genome of MOX compared with SOX suggests that gene family diversification is more recent in MOX compared with SOX. We could not detect a difference in selection pressure acting on the core and accessory genomes, mainly due to the low divergence level of these genes.

Genes in the Accessory Genome Mostly Function in Genome Plasticity

To further investigate the origin of the accessory genes in the MOX and SOX symbiont populations, we studied the distribution of functional categories across the core and accessory genomes. Functional annotation of all genes in the pangenomes by clusters of orthologous groups (COGs) revealed that functions associated with central metabolism are overrepresented in the core genomes of both SOX and MOX pangenomes (COG categories “translation, ribosomal structure and biogenesis” and “amino-acid transport and metabolism”). In SOX, multiple additional categories related to central metabolism are overrepresented (fig. 4A and B).

Fig. 4.

Fig. 4.

COG annotations for SOX and MOX pangenomes. Distribution of COG annotations for (A) SOX and (B) MOX core and accessory genomes. Stars represent the significance of Fisher's exact test for the differential presence of a specific COG category between core and accessory genes. *P-value < 0.05, **P-value < 5 × 10−5, ***P-value < 5 × 10−10 (FDR-corrected P-values). (C) Distribution of COG categories for 760 homologous pairs between SOX and MOX pangenomes, where 51 homologs involve at least one accessory gene (functions listed in supplementary table S7, Supplementary Material online). Ten homologs are accessory in both genomes (including two transposases and two restriction-modification system genes). Twenty-eight pairs involve a core gene in MOX and an accessory gene in SOX (including five genes in the COG category “cell wall/membrane/envelope biogenesis,” and also transposases, integrases, and restriction-modification system genes). Thirteen pairs involve a SOX core and a MOX accessory gene (including tree genes in the COG category “coenzyme transport and metabolism”).

The accessory genomes of both populations are overrepresented in functions associated with genome plasticity, namely “Mobilome: prophages, transposons,” “Defense mechanisms,” and “DNA replication, recombination and repair.” The COG category “Mobilome: prophages, transposons” is more prevalent in the MOX pangenome (62 genes in MOX, 26 genes in SOX). However, SOX shows a larger proportion of this category in the accessory genome (19, 73%) than MOX (17, 27%). Investigating the functional annotation of the genes in detail, we identified 14 integrases in SOX and five in MOX. Additionally, we found 70 transposases in MOX and 20 in SOX. Of these, 16 (23%) were identified as multi-copy genes in MOX, while three (15%) were identified as multi-copy genes in SOX.

The COG category “Defense mechanisms” is overrepresented in the accessory genomes of both populations (49 accessory out of 72 genes in SOX, 68%; 23 accessory out of 54 genes in MOX, 43%). We found a high number of genes associated with restriction-modification systems: 72 in MOX, where 24 (33%) are accessory, and 81 in SOX, where 48 genes (59%) are accessory. Additionally, we found a larger repertoire of CRISPR-Cas genes in SOX (23 genes, thereof 19 accessory) in comparison to MOX (seven genes, thereof none accessory). The analysis of the respective contigs with CRISPRCasFinder (Couvin et al. 2018) showed that nine cas genes on five contigs were found to be associated with CRISPR arrays of different types (types IF, IIB, three times IIU). Notably, all nine cas genes with a CRISPR array belong to the SOX accessory genome. The prevalence of defense mechanisms opens the question, whether viruses are present in the mussel environment. The prediction of phages on all metagenomic contigs revealed 51 potential viral contigs, of which there is only one high-confidence complete phage contig (termed Gokushovirinae sp. isolate VC_68_0, supplementary table S5, Supplementary Material online). Thus, viruses are either not highly abundant in the B. brooksi microbiome or could not be detected in the data analyzed here.

Several functions are differentially present between the pangenomes of the symbionts (fig. 3A and B). For example, we found that MOX contains 51 genes related to “Cell motility,” where 43 (84%) are in the core genome, whereas only three core genes were identified in SOX for this category. Additionally, 146 genes were annotated as “Signal transduction mechanisms” in MOX (21 accessory genes, 14%), whereas only 31 genes were found in SOX (11 accessory genes, 35%). To further examine the functions that are differentially present among strains, we inspected the genes that have been assigned to strains (supplementary table S6, Supplementary Material online). We observed that most of the strain-specific genes in SOX belong to the categories “Defense mechanisms” and “Replication, recombination and repair,” followed by “Cell wall/membrane/envelope biogenesis” category. Notably the COG category “Cell wall/membrane/envelope biogenesis” also includes the toxin-related genes that have been described previously to be variable across symbiont species (Sayavedra et al. 2015). In MOX, the categories “Defense mechanisms,” “Replication, recombination and repair,” and “Cell wall/membrane/envelope biogenesis” also differ between strains and, additionally, MOX strains differ in genes involved in “Signal transduction mechanisms.” For SOX and MOX, categories related to metabolism are rarely found to be strain-specific. The differences among the strain-specific functional categories suggest that the different strains differ in the repertoire of genes that are involved in interactions between organisms, including interactions with the mussel host and with other bacteria or mobile elements.

We conclude that in both symbiont pangenomes, the accessory genomes comprise mostly gene functions related to genome plasticity. The high prevalence of transposases and restriction-modification systems suggests that genome rearrangements contribute to gene content variation among strains. Additionally, we found that MOX has a larger repertoire of mobilome-related genes and contains more genes related to cell motility and signal transduction.

No Evidence for HGT between Both Symbiont Species

For endosymbiotic bacteria, the concept of an “intracellular arena” posits that the host cell serves as an arena for bacteria, where symbionts can acquire putatively beneficial genes from a niche-specific gene pool (Bordenstein and Reznikoff 2005; Newton and Bordenstein 2011). To identify mobile elements with the potential to transfer genes in that environment, we studied the diversity of transposases in the pangenomes. Transposases can duplicate within or between genomes, where recent duplications will show a low diversity between both transposase copies. Here, we observed a low genetic diversity only among transposases within the same species, whereas the minimum divergence between species is 0.50 nucleotide substitutions per site (supplementary fig. S10, Supplementary Material online). Thus, recent transposase duplication events have occurred only within species and not between species, which leads to the conclusion that there is no evidence for recent transfer of transposases between both symbiont species.

In addition, we aimed to identify possible HGT events between SOX and MOX symbionts. To this end, we inferred 760 homologous protein pairs between the two symbionts. Of these, 709 (93%) comprise core genes from both species, where the majority encode for central metabolism functions (fig. 4C). The homologs that are core genes of both species were presumably present in the common ancestor of both species and were then inherited vertically, that is, they are orthologs. In addition, 51 homologs involve at least one accessory gene (supplementary table S7, Supplementary Material online; fig. 4C). These are candidates for a recent horizontal transfer between symbionts, where a transfer would result in a high sequence similarity between the homologs. However, the protein sequence identity between homologous pairs does not reach high values (less than 86% for all pairs, fig. 5). In addition, the distributions of sequence identity of core pairs and of pairs involving at least one accessory gene are not significantly different (Wilcoxon rank sum test P-value = 0.3502; fig. 5). We conclude that the homologous pairs involving an accessory gene are indeed orthologs, which were differentially lost in one or several strains, rather than being acquired recently by HGT between the symbionts.

Fig. 5.

Fig. 5.

Empirical cumulative distribution of protein identity between orthologous pairs. Core pairs sequence identity: median 48.7, maximum 85.6; remaining pairs (i.e., containing at least one accessory gene) sequence identity: median 46.3, maximum 78.7.

Note that our approach might not be able to detect very recent transfers between SOX and MOX where both homologs are present in all of the samples. This limitation exists because a very recent between-species transfer within a single mussel may lead to discontinuous assemblies and exclusion of the transferred gene from both SOX and MOX contigs. However, transferred genes that accumulated mutations in the donor or recipient should be distinguishable during the assembly. Our approach is expected to perform well in detecting HGT events, where homologs are differentially present in the different mussels or where homologs accumulated more than 5% nucleotide divergence. That said, we did not find any gene cluster (i.e., a group of genes with less than 5% nucleotide divergence) that is present in both pangenomes and we only found homologs between SOX and MOX with less than 86% amino acid identity, which are presumably orthologs. Consequently, we conclude that HGT between the symbiont species is rare.

Discussion

Here, we used high-resolution metagenomics to examine the gene content of two co-occurring symbiont species that inhabit mussels from a single geographical site. We reconstructed the population pangenomes of the sampling sites by applying a novel approach that links accessory genes to core genes across the chromosomal contigs of multiple samples. The pangenomes reconstructed here reflect the pangenomes of the population at the sampling site (i.e., not that of the entire species). The site pangenome is expected to be smaller compared to the species pangenome, since only populations from the same niche are included and no difference in niche-specific genes is expected. We thus expect that the SOX and MOX species pangenomes are larger with a larger proportion of accessory genomes than the site pangenomes reported here.

The reconstruction of pangenomes from short-read metagenomic sequencing data is a challenging task. When isolate genomes are available, pangenome analysis can include the mapping of metagenomes to infer the ecology of the species (Delmont and Eren 2018; Utter et al. 2020). For species that cannot be cultivated, however, only MAGs might be available for a pangenome analysis. Although MAGs might miss genomic regions compared with isolates (Nelson et al. 2020; Meziti et al. 2021), pangenomics based on MAGs is a widely used approach for studying the evolution of bacteria that are difficult to cultivate; recent analyses include Wolbachia sequenced with their hosts (Scholz et al. 2020), Sulfurovum from deep-sea hydrothermal vents (Moulana et al. 2020), Thaumarchaeota from river sediments (Sheridan et al. 2020), and Chlamydiae present in various habitats covered by the Earth Microbiome initiative (Köstlbacher et al. 2021). It is particularly difficult to study genome rearrangements and to infer unlinked MGEs such as plasmids from short-read metagenomes (Maguire et al. 2020; Nelson et al. 2020). Additional approaches such as isolation and sequencing, methylation patterns from long reads (Beaulaurier et al. 2018), or linking DNA by Hi-C (Yaffe and Relman 2020) would be necessary to resolve MGEs with confidence and to link them to their host. The inferred pangenomes reported here thus comprise the core and accessory genes located on the symbiont chromosomes. We use several steps to improve the accuracy of the pangenome reconstruction from metagenomes (fig. 1). First, we employ co-abundance binning to reconstruct core genomes and subsequently a network approach to include all accessory genes that are linked on contigs to core genes or accessory genes from other samples. Second, we estimate the coverages for all genes in the pangenomes in each sample, also including genes that have not been reconstructed on the contigs of that sample. Third, strain content is highly variable across samples, and we observed several samples that were dominated by one strain only. These strains show very good assembly statistics (e.g., SOX N50 above 30,000, supplementary table S2, Supplementary Material online) and have been used to infer the strains’ gene contents. We thus conclude that the reconstructed MAGs and the reconstructed pangenomes are of high quality. Finally, our analysis focused especially on the genes that can be assigned to strain clades; these have been maintained over long time scales and might have an evolutionary relevance. However, we might have missed low-frequency strains in our analysis and accessory genes that are only present in samples with a high strain diversity might be missing from the assemblies. However these missing genes are not yet fixed in a strain clade and might thus not be relevant for adaptation. We thus conclude that the potentially missing genes are transient and belong to the cloud genome, that is, they are rare or nearly unique genes (Koonin and Wolf 2008).

Whereas the SOX pangenome has less genes than that of MOX, the former has a higher fraction of accessory genes. The large SOX accessory genome is consistent with the recent finding that gene content variation among coexisting thiotrophic bacteria is common (Ansorge et al. 2020). We found that functions associated with genome plasticity are consistently present in the accessory genomes of both species. Among them are genes encoding for defense mechanisms, in particular, genes related to restriction-modification systems. In addition to defense, restriction-modification systems can also function as MGEs. They have, for example, been shown to be involved in genome rearrangements in termite gut symbionts and in gene birth and death in the human gut bacterium Helicobacter pylori (Furuta et al. 2011; Zheng et al. 2016).

We find that MGEs, such as transposons, are more prevalent in MOX, which also has a high proportion of them in the core genome. Furthermore, MOX has a higher fraction of genes related to cell motility and signal transduction, which can be found in the core and accessory genome. Notably, both these functional categories have been found to be underrepresented in intracellular compared with free-living bacteria (Merhej et al. 2009; Lo et al. 2016); thus, they are more relevant for a free-living lifestyle. This suggests that MOX still pursues an active free-living life stage or that the association of MOX with Bathymodiolus is recent and ancient genes can still be found in the genome. Mussel phylogenies indeed support that the association with MOX is younger than that with SOX, where the clade comprising B. brooksi evolved from an ancestor with only the SOX symbiont about 10 million years ago (Lorion et al. 2013). Nevertheless, co-speciation of hosts and symbionts is rare in that system (Won et al. 2008) and it can thus not be ruled out that SOX and MOX symbiont populations have been replaced multiple times during mussel evolution.

Differences in pangenome size can also be caused by different rates of HGT. Nevertheless, we conclude that HGT is rare between species and between strain clades of both species (fig. 6). This conclusion is supported by the observations 1) that gene content differences between mussel individuals reflect the differences in strain composition, 2) that the accessory genome is less diverged than the core genome for both symbionts, and 3) that the homologs between SOX and MOX are not highly similar. Observations 1) and 2) support that accessory genes have been acquired at most once from another lineage and then evolved by descent with modification within the symbiont lineages. Gene loss or recombination within the strains might also contribute to accessory gene evolution, whereas multiple transfer from distantly related lineages or HGT between different strain clades is rare. Since additionally the reconstruction of homologs between SOX and MOX also did not reveal signals of recent gene transfer between the species (observation 3), we conclude that also HGT between the two species is rare. Notably, when comparing SOX from different Bathymodiolus host species to its closest free-living species, a higher fraction of genes that potentially originated by HGT has been previously found (Sayavedra et al. 2015). It is thus tempting to speculate that HGT was very important in early SOX adaptation, where extensive acquisition by HGT happened during evolution of SOX symbionts from free-living bacteria. Our analysis is restricted to events within the SOX population at a particular site and is thus focused on recent evolution of symbiont strains that are already adapted to the symbiotic environment. Furthermore, we mainly focus on HGT within the mussel environment (fig. 6), whereas HGT with other community members is an exciting direction for future research. SOX belongs to the Thioglobaceae family, which contains many endosymbiotic bacteria that are related to the free-living SUP05 (Ansorge et al. 2020). MOX belongs to the Methyloprofundus clade, also containing free-living members (Hirayama et al. 2022), which opens the possibility of HGT between closely related species during the free-living phase.

Fig. 6.

Fig. 6.

Routes of HGT studied here.

Notably, our conclusion that HGT is rare for SOX and MOX contrasts what is known for most other bacterial species, where HGT is a major evolutionary driver (Treangen and Rocha 2011; Brockhurst et al. 2019; Graña-Miraglia et al. 2017; Levade et al. 2017) and HGT events can even be used as epidemiological markers for bacteria with few differences in the core genome (Mateo-Estrada et al. 2021; Castillo-Ramírez 2022). We here developed a population genetics approach to compare measures on genetic isolation that are based either on nucleotide diversity or on gene content diversity. In the future, such approaches might also be applicable in the pan genomic epidemiology setting to integrate all information for the analysis of pathogen transmission (Castillo-Ramírez 2022).

Given that both symbionts can be found in the same mussel gill bacteriocyte (Duperron et al. 2007), the rarity of HGT contrasts their potential ability to access the habitat-specific gene pool as described for other species (Bordenstein and Reznikoff 2005; Newton and Bordenstein 2011; Polz et al. 2013). The rarity of HGT in this environment might be explained by the absence of DNA transfer mechanisms in the symbionts or by environmental properties. Regarding the latter, the intracellular environment may interfere with mechanisms that rely on the transfer of free DNA such as natural transformation, since the DNA could be quickly degraded by the prevalent mussel digestive enzymes (Ponnudurai et al. 2017). Likely HGT mechanisms in such an environment are such where the DNA is transferred in a packaged manner (such as in phages, gene transfer agents, or outer membrane vesicles) or transferred in direct contact between donor and recipient (as in conjugation). However, these mechanisms are most likely to transfer genes between symbionts within a single bacteriocyte only. We thus conclude that potential gene transfer events rarely establish in the population and that the host association results in genetically isolated subpopulations where HGT is limited. This is consistent with the observation that endosymbionts are rarely connected in gene transfer networks (Popa et al. 2011).

Symbioses are traditionally distinguished by being open (symbionts are environmentally acquired, resulting in frequent HGT and average or high GC content), closed (symbionts are parentally transmitted, resulting in the absence of HGT, low GC content, and genome degradation), or mixed (symbionts are mainly parentally transmitted, but occasional environmental acquisition occurs) (Perreau and Moran 2022). Here, we observe that the Bathymodiolus symbionts are environmentally transmitted, do not engage frequently in HGT, and have an average GC content of 38%. Thus this symbiosis does not fit into the traditional division. Environmental transmission is frequent in marine symbioses, potentially due to the opportunity to uptake locally adapted symbiont strains (Breusing et al. 2022). The uptake is often restricted to an early developmental stage (Franke et al. 2021), where the innate immune system has an important role in establishing symbioses in invertebrates, resulting in highly specific symbiont acquisition by the host from the environment (Nyholm and Graf 2012). Additionally, the long-term host association results in low strain diversity within one host, as also observed for the marine bacterium Vibrio fischeri colonizing the squid light organ (Bongrand et al. 2022). We conclude that these symbioses are not open, but rather restricted, thus we term them narrow symbioses. In narrow symbioses, the tight host association leads to genetically isolated subpopulations with low frequencies of gene transfer within the host environment, where low rates of recombination can rescue the genomes from extensive degradation (Russell et al. 2020). Pangenome evolution differs substantially between open and narrow symbioses, where gene content evolution of the latter symbionts is mainly driven by differential gene loss and HGT happens only occasionally.

Materials and Methods

Collection, Sequencing, and Core Genome Reconstruction

Details of the sample processing and sequencing have been described previously (Romero Picazo et al. 2019). In brief, 23 Bathymodiolus brooksi mussels were collected from the cold seep location MC853 at the northern Gulf of Mexico in May 2015. B. brooksi and B. childressi occurred at this site, as also observed before (Faure et al. 2015). Mussel gill homogenate has been sequenced using Illumina HiSeq2500 (250 bp paired-end reads with a median insert size of 400 bp, raw reads BioProject PRJNA508280). The core genomes are reconstructed as described previously (Romero Picazo et al. 2019) (see also fig. 1). In brief, individual samples were assembled with metaSPAdes (Nurk et al. 2017) and predicted genes from all samples were clustered with identity of 95% into gene clusters that represent the non-redundant gene catalog. Next, we estimated the gene abundances in each sample and performed co-abundance gene segregation by using a canopy clustering algorithm (Nielsen et al. 2014), which groups gene clusters into bins that covary in their abundances across the different samples. Two metagenomic species (MGS) were classified as the SOX and MOX core genome, respectively, and lowly and highly covered genes were further filtered. We additionally identified a third MGS (MGS3), which is present in a single sample in low abundance and for which no taxonomy could be inferred. After discarding samples with high variance in symbiont marker gene coverages and with low coverage after binning, 19 mussel samples remained as the basis for the following analysis. In addition to the analysis presented in (Romero Picazo et al. 2019), singletons that originated from discarded samples were removed from the core genomes. This results in a SOX core genome of 1,408 genes and a MOX core genome of 2,443 genes.

Pangenome Reconstruction

Differences in strain composition generate different assembly fragmentation patterns across samples, where genome regions that are present only in specific strains tend to result in isolated contigs. Here, we restore the linkage between contigs that belong to the same species and identify, from all the genes present in the catalog, the accessory and multi-copy genes that belong to SOX and MOX symbiont pangenomes. Our approach is based on the nonredundant gene catalog, which contains gene clusters present across samples with at least 95% of sequence identity (fig. 1).

We used a network traversal approach, where the genes that are part of core gene clusters were used as initial seeds. First, we located the seed genes on the contigs of the 19 samples. The first layer of the pangenome contains all additional gene clusters having gene members that are found on any of those contigs. Next, we used the genes that are part of these newly added gene clusters as seeds to expand the network. The extension of the network continues in an iterative manner until no additional genes can be added to the pangenome (supplementary table S1, Supplementary Material online). Microbial genomes contain multi-copy genes such as transposons. The use of multi-copy genes as seeds would potentially cause the spurious linkage of genome fragments. In order to avoid such artifacts, we only considered those genes as seeds that originate from clusters with sequence identity of at least 0.95 and that contain at most one gene from each sample. Nevertheless, as multi-copy genes can naturally be found in bacterial genomes, they were included in the pangenomes. We discarded a singleton gene from the SOX core genome connecting to the MOX pangenome. The link between the two pangenomes occurred in the second layer of the network, when two contigs containing each two genes in each pangenome were connected by another contig containing two genes. Although the SOX singleton certainly has coverage corresponding to a core gene, it only is present in one sample and has no functional annotation. Therefore, we suspect that the singleton has been misassigned as a SOX core gene and decided to not include it in the analysis. To avoid misassignment of genes in the MOX pangenome, we additionally discarded the two genes belonging to the contig connecting both pangenomes as well.

The quality of the pangenomes has been assessed by studying the distribution of two different gene cluster features across the pangenome layers, the cluster size and the cluster sequence identity (supplementary table S1, Supplementary Material online). For clusters added to layers of the pangenome, the gene cluster size is below 19 (the number of samples). This is expected for gene clusters that are not present in every strain and supports their status as accessory. Additionally, the median sequence identity of the clusters is close to one (>0.99), which supports that the gene clusters added to the pangenomes are not affected by contamination, that is, genes from a different MGS. We denote the additional gene clusters added to the network as “accessory” genes, where genes with a higher coverage than the maximum core gene coverage in any sample are denoted as “multi-copy accessory,” in contrast to “single-copy accessory.” In addition to the core genes, we observed genes showing a similar coverage to the median core genome coverage across samples (median sample coverage ± 0.15 × median core sample coverage for all samples, 114 SOX genes, 9 MOX genes). These genes were added to the core genome for all the presented analyses.

The contigs connected in the network of each species correspond to the MAGs of each sample. We assessed the degree of completeness and contamination for MOX and SOX MAGs using the set of Gammaproteobacteria marker genes and the “taxonomy_wf” workflow from CheckM v1.0.18 (supplementary table S2, Supplementary Material online) (Parks et al. 2015).

Measures of Population Diversity and Diversification

SNVs were detected on the non-redundant gene catalog as described in (Romero Picazo et al. 2019). Briefly, the reads in each sample were first downsampled to the minimum number of reads per sample and mapped to the gene catalog using bwa mem (Li 2013). For the analysis of the SOX pangenome, the reads of each sample were subsampled to the smallest median core coverage for SOX, that is., when the smallest SOX core coverage is X and the sample has a SOX core coverage of Y, then all the reads were subsampled to a proportion of X/Y. For the analysis of the MOX pangenome, the subsampling was repeated based on the MOX core coverages. This normalization does not account for coverage differences due to different accessory gene abundance within a sample. LoFreq was used for probabilistic realignment and variant calling of each sample independently (Wilm et al. 2012) and the detected SNVs have been hard filtered according to GATK best practices.

Strain reconstruction for the core genomes has been performed with DESMAN (De novo Extraction of Strains from Metagenomes) as described before (Quince et al. 2017; Romero Picazo et al. 2019). In brief, this method uses the SNV frequency covariation across samples to assign the SNV states to a specific genotype. DESMAN converged with a posterior mean deviance lower than 5% for 11 SOX strains and 6 MOX strains.

The SNV data is used for calculating intrasample and intersample nucleotide diversity (π) and FST as described previously (Schloissnig et al. 2013; Romero Picazo et al. 2019). π estimates the average frequency of nucleotide differences over all pairs in a sample. Small FST values indicate that the samples stem from the same population, that is, their SNV states and frequencies are similar. In contrast, large FST values indicate that the samples constitute subpopulations, that is, the samples differ in the frequencies of their SNV states.

To study the microbial community composition, strain frequencies and relatedness are used for calculating α- and β-diversity as described previously (Romero Picazo et al. 2019). In brief, we estimated α-diversity using phylogeny species evenness (Helmus et al. 2007) implemented in the R package “Picante” (Kembel et al. 2010). β-diversity was estimated using the weighted Unifrac distance, which quantifies differences in strain community composition between two samples and accounts for phylogenetic relationships (Chen 2018).

To estimate the degree of genetic isolation based on gene content, we have derived the intra- and inter-sample gene diversity measures. We here define the gene diversity φ which estimates the average frequency of gene content differences analogous to the nucleotide diversity π. For frequencies estimated from metagenomes, it is estimated as: ϕ(S)=2|G|i=1|G|(γi,Smin(0,cSγi,S)cS(cS1)), where G is the number of genes in the pangenome, γi,S is the coverage of gene i in sample S (measured as mean coverage across gene positions) and cs is the median coverage of core genes in sample S. Note that the minimum ensures that the difference between accessory gene coverage and median core coverage cannot be negative. Note that this definition differs from the previously defined measure of genome fluidity, that is based on the ratio of unique gene families to the sum of gene families in pairs of genomes (Kislyuk et al. 2011). In contrast, φ as defined here gives an estimation for the average proportion of gene differences based on all genes in the pangenome, that is, shared genes are only counted once.

The inter-sample gene diversity is then estimated as: ϕ(S1,S2)=1|G|i=1|G|(γi,S1min(0,cS2γi,S2)cS1cS2+γi,S2min(0,cS1γi,S1)cS1cS2), where S1 and S2 correspond to the two samples compared. Finally, analogous to FST, we define the pangenome fixation index PST which measures the genetic differentiation based on the gene diversity present within and between populations: PST(S1,S2)=1ϕ(S1)+ϕ(S2)2ϕ(S1,S2). Small PST values indicate that both samples have similar gene content and gene frequencies, whereas large PST values indicate that the samples differ in their gene frequencies.

pN/pS, the ratio of non-synonymous to synonymous polymorphism rates, is a variant of dN/dS and is estimated based on intra-species SNVs as described previously (Romero Picazo et al. 2019). In brief, the observed ratio of nonsynonymous to synonymous mutations based on biallelic SNVs is divided by the expected ratio of nonsynonymous to synonymous mutations based on all possible mutations in each of the codons. pN/pS was estimated individually for each of the genes in the two symbiont species as well as for the core and accessory genome for each symbiont.

The scripts to calculate these statistics have been deposited at https://github.com/deropi/BathyBrooksiSymbionts.

Reconstruction of Strain Gene Content

To assign the accessory genes to particular strains, we first identified samples with a dominant strain, that is, having a frequency of at least 0.7. In these samples, accessory genes with a coverage of at least 50% of the median core coverage were assigned to the dominant strain. Strain-assigned genomes were reconstructed by merging all the genes assigned to the same strain across samples. Clade-assigned genomes were inferred by merging all the genes assigned to strains of a particular clade. To determine if the symbiont pangenomes are open or closed, we estimated the fit of our reconstructed strain pangenomes to the Heaps law using the micropan package in R (Tettelin et al. 2008; Snipen and Liland 2015).

Functional Annotation and Overrepresentation Analysis

Functional annotation by COGs was determined by eggNOG-mapper v2 (Cantalapiedra et al. 2021). Genes with multiple COGs contribute to the counts of each of the COGs. The overrepresentation of gene categories in either core or accessory genomes was done by performing multiple Fisher’s exact tests for each COG category with FDR (false discovery rate) P-value correction.

Transposase genes were identified as genes containing the substring “Transposase” or “transposase” and integrase genes were identified as genes containing the substring “Integrase” or “integrase”. Additionally, we used HMMer (biosequence analysis using profile hidden Markov models) (Eddy 2011, 2019) to screen for functional domains with hmmsearch (e-value <1e−4) against Pfam (Mistry et al. 2021) that are related to restriction-modification and CRISPR-Cas. The list of Pfam accessions used is provided in supplementary table S8, Supplementary Material online, where restriction-modification profiles have been taken from (Croucher et al. 2014) and CRISPR-Cas profiles were directly extracted by looking for the keyword “CRISPR-Cas” in the Pfam portal (http://pfam.xfam.org/search).

Phage Identification

All contigs of the 19 samples were screened for viral sequences in two steps. First, contigs with VirSorter categories 1, 2, and 4 were retained (Roux et al. 2015). Second, genes on those contigs were searched with hmmscan (e-value <1e−5) (Eddy 2011, 2019) against pVOGs (Prokaryotic Virus Orthologous Groups) (Grazziotin et al. 2017) and contigs were considered as viral, if they have at least three pVOGs, where at least two pVOGs have a viral quotient of at least 0.8. Phage contigs were clustered with public phages using vConTACT2 (Jang et al. 2019). Average amino acid identity (AAI) has been calculated with CompareM (Parks 2020).

Transposase Genetic Distance Estimation

To estimate the degree of divergence among transposases in the population, pairwise alignments between all pairs of transposases were reconstructed with MAFFT (multiple sequence alignment based on fast Fourier transform) auto mode (Katoh and Standley 2013) and pairwise distances are estimated with the K80 model implemented in the R package ape (Paradis and Schliep 2019).

Homologous Proteins between SOX and MOX

To identify homologous proteins, we extracted reciprocal best blast (Altschul et al. 1990) hits between the translated genes in the SOX and MOX pangenomes. Then, full-length protein sequence alignments were reconstructed with the Needleman-Wunsch algorithm implemented in EMBOSS (European Molecular Biology Open Software Suite) (Needleman and Wunsch 1970; Rice et al. 2000), and pairs of proteins with pairwise identities of at least 30% were assigned as homologous.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online.

Supplementary Material

evac098_Supplementary_Data

Acknowledgments

We like to thank Nicole Dubilier for the collection of the original data and valuable discussions on the manuscript. We thank Marina Khachaturyan for comments on the manuscript and Franz Baumdicker for discussions on gene content diversity statistics. This work was supported by the CRC1182 Origin and Function of Metaorganisms. We acknowledge financial support by DFG within the funding programm Open Access Publikationskosten.

Contributor Information

Devani Romero Picazo, Genomic Microbiology Group, Institute of General Microbiology, Christian-Albrechts University, 24118 Kiel, Germany.

Almut Werner, Genomic Microbiology Group, Institute of General Microbiology, Christian-Albrechts University, 24118 Kiel, Germany.

Tal Dagan, Genomic Microbiology Group, Institute of General Microbiology, Christian-Albrechts University, 24118 Kiel, Germany.

Anne Kupczok, Genomic Microbiology Group, Institute of General Microbiology, Christian-Albrechts University, 24118 Kiel, Germany; Max Planck Institute for Marine Microbiology, 28359 Bremen, Germany; Bioinformatics Group, Wageningen University & Research, 6708PB Wageningen, The Netherlands.

Data availability

The MAGs of both symbionts are available at NCBI under the Bioproject ID PRJNA508280 with BioSample IDs SAMN21876924 - SAMN21876961. The genome of the virus genome Gokushovirinae sp. isolate VC_68_0 is available with the GenBank accession OL437471.1. The protein sequences for both symbiont pangenomes have been deposited in the GitHub repository https://github.com/deropi/BathyBrooksiSymbionts.

Literature Cited

  1. Altschul  SF, Gish  W, Miller  W, Myers  EW, Lipman  DJ. 1990. Basic local alignment search tool. J Mol Biol. 215:403–410. 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  2. Ansorge  R, et al.  2019. Functional diversity enables multiple symbiont strains to coexist in deep-sea mussels. Nat Microbiol. 4:2487–2497. 10.1038/s41564-019-0572-9 [DOI] [PubMed] [Google Scholar]
  3. Ansorge  R, et al.  2020. The hidden pangenome: comparative genomics reveals pervasive diversity in symbiotic and free-living sulfur-oxidizing bacteria. https://www.biorxiv.org/content/10.1101/2020.12.11.421487v1  (Accessed March 20, 2021).
  4. Beaulaurier  J, et al.  2018. Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation. Nat Biotechnol. 36:61–69. 10.1038/nbt.4037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bohlin  J, Eldholm  V, Pettersson  JHO, Brynildsrud  O, Snipen  L. 2017. The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes. BMC Genomics  18:151. 10.1186/s12864-017-3543-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bongrand  C, et al.  2022. Evidence of genomic diversification in a natural symbiotic population within its host. Front Microbiol. 13:854355. 10.3389/fmicb.2022.854355 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bordenstein  SR, Reznikoff  WS. 2005. Mobile DNA in obligate intracellular bacteria. Nat Rev Microbiol. 3:688–699. 10.1038/nrmicro1233 [DOI] [PubMed] [Google Scholar]
  8. Breusing  C, Genetti  M, Russell  SL, Corbett-Detig  RB, Beinart  RA. 2022. Horizontal transmission enables flexible associations with locally adapted symbiont strains in deep-sea hydrothermal vent symbioses. Proc Natl Acad Sci U S A. 119:e2115608119. 10.1073/pnas.2115608119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bright  M, Bulgheresi  S. 2010. A complex journey: transmission of microbial symbionts. Nat Rev Microbiol. 8:218–30. 10.1038/nrmicro2262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Brockhurst  MA, et al.  2019. The ecology and evolution of Pangenomes. Curr Biol. 29:R1094–R1103. 10.1016/j.cub.2019.08.012 [DOI] [PubMed] [Google Scholar]
  11. Cantalapiedra  CP, Hernández-Plaza  A, Letunic  I, Bork  P, Huerta-Cepas  J. 2021. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol. 38:5825–5829. 10.1093/molbev/msab293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Castelle  CJ, Banfield  JF. 2018. Major new microbial groups expand diversity and alter our understanding of the tree of life. Cell  172:1181–1197. 10.1016/j.cell.2018.02.016 [DOI] [PubMed] [Google Scholar]
  13. Castillo-Ramírez  S. 2022. Beyond microbial core genomic epidemiology: towards pan genomic epidemiology. Lancet Microbe  3:e244–e245. 10.1016/S2666-5247(22)00058-1 [DOI] [PubMed] [Google Scholar]
  14. Chen  J. 2018. GUniFrac: Generalized UniFrac Distances. https://CRAN.R-project.org/package=GUniFrac. [DOI] [PMC free article] [PubMed]
  15. Conrad  RE, et al.  2022. Toward quantifying the adaptive role of bacterial pangenomes during environmental perturbations. ISME J. 16:1222–1234. 10.1038/s41396-021-01149-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Couvin  D, et al.  2018. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 46:W246–W251. 10.1093/nar/gky425 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Croucher  NJ, et al.  2014. Diversification of bacterial genome content through distinct mechanisms over different timescales. Nat Commun. 5:5471. 10.1038/ncomms6471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Delmont  TO, Eren  AM. 2018. Linking pangenomes and metagenomes: the Prochlorococcus metapangenome. PeerJ  6:e4320. 10.7717/peerj.4320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Denef  VJ. 2019. Peering into the genetic makeup of natural microbial populations using metagenomics. In: Polz  MF, Rajora  OP, editors. Population genomics: microorganisms. Cham: Population Genomics Springer International Publishing. p. 49–75. [Google Scholar]
  20. Ding  W, Baumdicker  F, Neher  RA. 2018. panX: pan-genome analysis and exploration. Nucleic Acids Res. 46:e5. 10.1093/nar/gkx977 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Duperron  S, et al.  2007. Diversity, relative abundance and metabolic potential of bacterial endosymbionts in three Bathymodiolus mussel species from cold seeps in the Gulf of Mexico. Environ Microbiol. 9:1423–1438. 10.1111/j.1462-2920.2007.01259.x [DOI] [PubMed] [Google Scholar]
  22. Eddy  SR. 2011. Accelerated profile HMM searches. PLOS Comput Biol. 7:e1002195. 10.1371/journal.pcbi.1002195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Eddy  SR. 2019. HMMER 3.3. http://hmmer.org/.
  24. Ellegaard  KM, Engel  P. 2019. Genomic diversity landscape of the honey bee gut microbiota. Nat Commun. 10:446. 10.1038/s41467-019-08303-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Faure  B, Schaeffer  SW, Fisher  CR. 2015. Species distribution and population connectivity of deep-sea mussels at hydrocarbon seeps in the Gulf of Mexico. PloS One  10:e0118460. 10.1371/journal.pone.0118460 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Franke  M, Geier  B, Hammel  JU, Dubilier  N, Leisch  N. 2021. Coming together—symbiont acquisition and early development in deep-sea bathymodioline mussels. Proc R Soc B Biol Sci. 288:20211044. 10.1098/rspb.2021.1044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Furuta  Y, et al.  2011. Birth and death of genes linked to chromosomal inversion. Proc Natl Acad Sci U S A. 108:1501–1506. 10.1073/pnas.1012579108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Garud  NR, Good  BH, Hallatschek  O, Pollard  KS. 2019. Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLOS Biol. 17:e3000102. 10.1371/journal.pbio.3000102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Giovannoni  SJ, Cameron Thrash  J, Temperton  B. 2014. Implications of streamlining theory for microbial ecology. ISME J. 8:1553–1565. 10.1038/ismej.2014.60 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Graña-Miraglia  L, et al.  2017. Rapid gene turnover as a significant source of genetic variation in a recently seeded population of a healthcare-associated pathogen. Front Microbiol. 8:1817. 10.3389/fmicb.2017.01817 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Grazziotin  AL, Koonin  EV, Kristensen  DM. 2017. Prokaryotic virus orthologous groups (pVOGs): a resource for comparative genomics and protein family annotation. Nucleic Acids Res. 45:D491–D498. 10.1093/nar/gkw975 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Greenlon  A, et al.  2019. Global-level population genomics reveals differential effects of geography and phylogeny on horizontal gene transfer in soil bacteria. Proc Natl Acad Sci U S A. 116:15200–15209. 10.1073/pnas.1900056116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Guyomar  C, et al.  2018. Multi-scale characterization of symbiont diversity in the pea aphid complex through metagenomic approaches. Microbiome  6:181. 10.1186/s40168-018-0562-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Hall  JPJ, Brockhurst  MA, Harrison  E. 2017. Sampling the mobile gene pool: innovation via horizontal gene transfer in bacteria. Philos Trans R Soc B  372:20160424. 10.1098/rstb.2016.0424 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Helmus  MR, Bland  TJ, Williams  CK, Ives  AR. 2007. Phylogenetic measures of biodiversity. Am Nat. 169:E68–E83. 10.1086/511334 [DOI] [PubMed] [Google Scholar]
  36. Hirayama  H, et al.  2022. Multispecies populations of methanotrophic methyloprofundus and cultivation of a likely dominant species from the Iheya north deep-sea hydrothermal field. Appl Environ Microbiol. 88:e00758-21. 10.1128/AEM.00758-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Ikuta  T, et al.  2016. Heterogeneous composition of key metabolic gene clusters in a vent mussel symbiont population. ISME J. 10:990–1101. 10.1038/ismej.2015.176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Jang  HB, et al.  2019. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat Biotechnol. 37:632–639. 10.1038/s41587-019-0100-8 [DOI] [PubMed] [Google Scholar]
  39. Kashtan  N, et al.  2014. Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science  344:416–420. 10.1126/science.1248575 [DOI] [PubMed] [Google Scholar]
  40. Katoh  K, Standley  DM. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 30:772–780. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Kembel  SW, et al.  2010. Picante: R tools for integrating phylogenies and ecology. Bioinformatics  26:1463–1464. 10.1093/bioinformatics/btq166 [DOI] [PubMed] [Google Scholar]
  42. Kislyuk  AO, Haegeman  B, Bergman  NH, Weitz  JS. 2011. Genomic fluidity: an integrative view of gene diversity within microbial populations. BMC Genomics  12:32. 10.1186/1471-2164-12-32 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Kloesges  T, Popa  O, Martin  W, Dagan  T. 2011. Networks of gene sharing among 329 proteobacterial genomes reveal differences in lateral gene transfer frequency at different phylogenetic depths. Mol Biol Evol. 28:1057–1074. 10.1093/molbev/msq297 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Koonin  EV, Wolf  YI. 2008. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 36:6688–6719. 10.1093/nar/gkn668 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Köstlbacher  S, et al.  2021. Pangenomics reveals alternative environmental lifestyles among chlamydiae. Nat Commun. 12:4021. 10.1038/s41467-021-24294-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Laming  SR, Gaudron  SM, Duperron  S. 2018. Lifecycle ecology of deep-sea chemosymbiotic mussels: a review. Front Mar Sci. 5:282. 10.3389/fmars.2018.00282 [DOI] [Google Scholar]
  47. Levade  I, et al.  2017. Vibrio cholerae genomic diversity within and between patients. Microb Genomics  3:000142. 10.1099/mgen.0.000142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Li  H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv13033997 Q-BioGN. http://arxiv.org/abs/1303.3997  (Accessed August 2, 2016).
  49. Liao  J, et al.  2021. Nationwide genomic atlas of soil-dwelling listeria reveals effects of selection and population ecology on pangenome evolution. Nat Microbiol. 6:1021–1030. 10.1038/s41564-021-00935-7 [DOI] [PubMed] [Google Scholar]
  50. Lo  W-S, Huang  Y-Y, Kuo  C-H. 2016. Winding paths to simplicity: genome evolution in facultative insect symbionts. FEMS Microbiol Rev. 40:855–874. 10.1093/femsre/fuw028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. López-Madrigal  S, Gil  R. 2017. Et tu, Brute? Not even intracellular mutualistic symbionts escape horizontal gene transfer. Genes  8:247. 10.3390/genes8100247 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Lorion  J, et al.  2013. Adaptive radiation of chemosymbiotic deep-sea mussels. Proc R Soc B  280:20131243. 10.1098/rspb.2013.1243 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Maguire  F, Jia  B, Gray  KL. 2020. Metagenome-a­ ssembled genome binning methods with short reads disproportionately fail for plasmids and genomic Islands. Microb Genomics  6:12. 10.1099/mgen.0.000436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Maistrenko  OM, et al.  2020. Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity. ISME J. 14:1247–1259. 10.1038/s41396-020-0600-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Mateo-Estrada  V, et al.  2021. Accessory genomic epidemiology of cocirculating Acinetobacter baumannii clones. mSystems  6:e00626-21. 10.1128/mSystems.00626-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Merhej  V, Royer-Carenzi  M, Pontarotti  P, Raoult  D. 2009. Massive comparative genomic analysis reveals convergent evolution of specialized bacteria. Biol Direct  4:13. 10.1186/1745-6150-4-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Meziti  A, et al.  2021. The reliability of metagenome-assembled genomes (MAGs) in representing natural populations: insights from comparing MAGs against isolate genomes derived from the same fecal sample. Appl Environ Microbiol. 87:e02593-20. 10.1128/AEM.02593-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Minh  BQ, et al.  2020. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 37:1530–1534. 10.1093/molbev/msaa015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Mistry  J, et al.  2021. Pfam: the protein families database in 2021. Nucleic Acids Res. 49:D412–D419. 10.1093/nar/gkaa913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Moulana  A, Anderson  RE, Fortunato  CS, Huber  JA. 2020. Selection is a significant driver of gene gain and loss in the pangenome of the bacterial genus sulfurovum in geographically distinct deep-sea hydrothermal vents. mSystems  5:e00673-19. 10.1128/mSystems.00673-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Needleman  SB, Wunsch  CD. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 48:443–453. 10.1016/0022-2836(70)90057-4 [DOI] [PubMed] [Google Scholar]
  62. Nelson  WC, Tully  BJ, Mobberley  JM. 2020. Biases in genome reconstruction from metagenomic data. PeerJ  8:e10119. 10.7717/peerj.10119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Newton  ILG, Bordenstein  SR. 2011. Correlations between bacterial ecology and mobile DNA. Curr Microbiol. 62:198–208. 10.1007/s00284-010-9693-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Nielsen  HB, et al.  2014. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol. 32:822–828. 10.1038/nbt.2939 [DOI] [PubMed] [Google Scholar]
  65. Nurk  S, Meleshko  D, Korobeynikov  A, Pevzner  PA. 2017. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27:824–834. 10.1101/gr.213959.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Nyholm  SV, Graf  J. 2012. Knowing your friends: invertebrate innate immunity fosters beneficial bacterial symbioses. Nat Rev Microbiol. 10:815–827. 10.1038/nrmicro2894 [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Paradis  E, Schliep  K. 2019. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics  35:526–528. 10.1093/bioinformatics/bty633 [DOI] [PubMed] [Google Scholar]
  68. Parks  D. 2020. dparks1134/CompareM. https://github.com/dparks1134/CompareM  (Accessed November 20, 2020).
  69. Parks  DH, Imelfort  M, Skennerton  CT, Hugenholtz  P, Tyson  GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25:1043–1055. 10.1101/gr.186072.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Perreau  J, Moran  NA. 2022. Genetic innovations in animal–microbe symbioses. Nat Rev Genet. 23:23–39. 10.1038/s41576-021-00395-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Pinto-Carbó  M, et al.  2016. Evidence of horizontal gene transfer between obligate leaf nodule symbionts. ISME J.  10:2092–2105. 10.1038/ismej.2016.27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Polz  MF, Alm  EJ, Hanage  WP. 2013. Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends Genet  29:170–175. 10.1016/j.tig.2012.12.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Ponnudurai  R, et al.  2017. Metabolic and physiological interdependencies in the Bathymodiolus azoricus symbiosis. ISME J. 11:463–477. 10.1038/ismej.2016.124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Popa  O, Hazkani-Covo  E, Landan  G, Martin  W, Dagan  T. 2011. Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Res. 21:599–609. 10.1101/gr.115592.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Quince  C, et al.  2017. DESMAN: a new tool for de novo extraction of strains from metagenomes. Genome Biol. 18:181. 10.1186/s13059-017-1309-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Rice  P, Longden  I, Bleasby  A. 2000. EMBOSS : the European molecular biology open software suite. Trends Genet. 16:276–277. 10.1016/S0168-9525(00)02024-2 [DOI] [PubMed] [Google Scholar]
  77. Robbins  SJ, et al.  2021. A genomic view of the microbiome of coral reef demosponges. ISME J. 15:1641–1654. 10.1038/s41396-020-00876-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Rocha  EPC, et al.  2006. Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 239:226–235. 10.1016/j.jtbi.2005.08.037 [DOI] [PubMed] [Google Scholar]
  79. Romero Picazo  D, et al.  2019. Horizontally transmitted symbiont populations in deep-sea mussels are genetically isolated. ISME J. 13:2954–2968. 10.1038/s41396-019-0475-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Rossum  TV, Ferretti  P, Maistrenko  OM, Bork  P. 2020. Diversity within species: interpreting strains in microbiomes. Nat Rev Microbiol. 18:491–506. 10.1038/s41579-020-0368-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Roux  S, Enault  F, Hurwitz  BL, Sullivan  MB. 2015. VirSorter: mining viral signal from microbial genomic data. PeerJ  3:e985. 10.7717/peerj.985 [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Russell  SL. 2019. Transmission mode is associated with environment type and taxa across bacteria-eukaryote symbioses: a systematic review and meta-analysis. FEMS Microbiol Lett. 366:fnz013. 10.1093/femsle/fnz013 [DOI] [PubMed] [Google Scholar]
  83. Russell  SL, et al.  2020. Horizontal transmission and recombination maintain forever young bacterial symbiont genomes. PLOS Genet. 16:e1008935. 10.1371/journal.pgen.1008935 [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Sayavedra  L, et al.  2015. Abundant toxin-related genes in the genomes of beneficial symbionts from deep-sea hydrothermal vent mussels. eLife  4:e07966. 10.7554/eLife.07966 [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Schloissnig  S, et al.  2013. Genomic variation landscape of the human gut microbiome. Nature  493:45–50. 10.1038/nature11711 [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Scholz  M, et al.  2020. Large scale genome reconstructions illuminate Wolbachia evolution. Nat Commun. 11:5235. 10.1038/s41467-020-19016-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Sheridan  PO, et al.  2020. Gene duplication drives genome expansion in a major lineage of Thaumarchaeota. Nat Commun. 11:5494. 10.1038/s41467-020-19132-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Snipen  L, Liland  KH. 2015. micropan: an R-package for microbial pan-genomics. BMC Bioinformatics  16:79. 10.1186/s12859-015-0517-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Tettelin  H, Riley  D, Cattuto  C, Medini  D. 2008. Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol. 11:472–477. 10.1016/j.mib.2008.09.006 [DOI] [PubMed] [Google Scholar]
  90. Treangen  TJ, Rocha  EPC. 2011. Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genet. 7:e1001284. 10.1371/journal.pgen.1001284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Tria  FDK, Martin  WF. 2021. Gene duplications are at least 50 times less frequent than gene transfers in prokaryotic genomes. Genome Biol Evol. 13:evab224. 10.1093/gbe/evab224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Utter  DR, Borisy  GG, Eren  AM, Cavanaugh  CM, Mark Welch  JL. 2020. Metapangenomics of the oral microbiome provides insights into habitat adaptation and cultivar diversity. Genome Biol. 21:293. 10.1186/s13059-020-02200-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Waterworth  SC, et al.  2020. Horizontal gene transfer to a defensive symbiont with a reduced genome in a multipartite beetle microbiome. mBio  11:e02430-19. 10.1128/mBio.02430-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Wentrup  C, Wendeberg  A, Schimak  M, Borowski  C, Dubilier  N. 2014. Forever competent: deep-sea bivalves are colonized by their chemosynthetic symbionts throughout their lifetime. Environ Microbiol. 16:3699–3713. 10.1111/1462-2920.12597 [DOI] [PubMed] [Google Scholar]
  95. Wilm  A, et al.  2012. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40:11189–11201. 10.1093/nar/gks918 [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Won  Y-J, et al.  2003. Environmental acquisition of thiotrophic endosymbionts by deep-sea mussels of the genus Bathymodiolus. Appl Environ Microbiol. 69:6785–6792. 10.1128/AEM.69.11.6785-6792.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Won  Y-J, Jones  WJ, Vrijenhoek  RC. 2008. Absence of cospeciation between deep-sea mytilids and their thiotrophic endosymbionts. J. Shellfish Res  27:129–138. 10.2983/0730-8000(2008)27[129:AOCBDM]2.0.CO;2 [DOI] [Google Scholar]
  98. Yaffe  E, Relman  DA. 2020. Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation. Nat Microbiol. 5:343–353. 10.1038/s41564-019-0625-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Zheng  H, Dietrich  C, Hongoh  Y, Brune  A. 2016. Restriction-modification systems as mobile genetic elements in the evolution of an intracellular symbiont. Mol Biol Evol. 33:721–725. 10.1093/molbev/msv264 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

evac098_Supplementary_Data

Data Availability Statement

The MAGs of both symbionts are available at NCBI under the Bioproject ID PRJNA508280 with BioSample IDs SAMN21876924 - SAMN21876961. The genome of the virus genome Gokushovirinae sp. isolate VC_68_0 is available with the GenBank accession OL437471.1. The protein sequences for both symbiont pangenomes have been deposited in the GitHub repository https://github.com/deropi/BathyBrooksiSymbionts.


Articles from Genome Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES