Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2020 Aug 25;16(8):e1008935. doi: 10.1371/journal.pgen.1008935

Horizontal transmission and recombination maintain forever young bacterial symbiont genomes

Shelbi L Russell 1,2,*, Evan Pepper-Tunick 2,3, Jesper Svedberg 2,3, Ashley Byrne 1, Jennie Ruelas Castillo 1, Christopher Vollmers 2,3, Roxanne A Beinart 4, Russell Corbett-Detig 2,3,*
Editor: Xavier Didelot5
PMCID: PMC7473567  PMID: 32841233

Abstract

Bacterial symbionts bring a wealth of functions to the associations they participate in, but by doing so, they endanger the genes and genomes underlying these abilities. When bacterial symbionts become obligately associated with their hosts, their genomes are thought to decay towards an organelle-like fate due to decreased homologous recombination and inefficient selection. However, numerous associations exist that counter these expectations, especially in marine environments, possibly due to ongoing horizontal gene flow. Despite extensive theoretical treatment, no empirical study thus far has connected these underlying population genetic processes with long-term evolutionary outcomes. By sampling marine chemosynthetic bacterial-bivalve endosymbioses that range from primarily vertical to strictly horizontal transmission, we tested this canonical theory. We found that transmission mode strongly predicts homologous recombination rates, and that exceedingly low recombination rates are associated with moderate genome degradation in the marine symbionts with nearly strict vertical transmission. Nonetheless, even the most degraded marine endosymbiont genomes are occasionally horizontally transmitted and are much larger than their terrestrial insect symbiont counterparts. Therefore, horizontal transmission and recombination enable efficient natural selection to maintain intermediate symbiont genome sizes and substantial functional genetic variation.

Author summary

Symbiotic associations between bacteria and eukaryotes are ubiquitous in nature and have contributed to the evolution of radically novel phenotypes and niches for the involved partners. New metabolic or physiological capacities that arise in these associations are typically encoded by the bacterial symbiont genomes. However, the association itself endangers the retention of bacterial genomic coding capacity. Endosymbiont genome evolution theory predicts that when bacterial symbionts become restricted to host tissues, their populations cannot remove deleterious mutations efficiently. This ultimately results in their genomes degrading to small, function-poor states, reminiscent of organellar genomes. However, many ancient marine endosymbionts do not fit this prediction, but instead retain relatively large, gene-rich genomes, indicating that the evolutionary dynamics of this process need more thorough characterization. Here we show that on-going symbiont gene flow via horizontal transmission between bivalve hosts and recombination among divergent gammaproteobacterial symbiont lineages are sufficient to maintain large and dynamic bacterial symbiont genomes. These findings indicate that many obligately associated symbiont genomes may not be as isolated from one another as previously assumed and are not on a one way path to degradation.

Introduction

Bacterial genomes encode an enormous diversity of functions, which enable them to create radically novel phenotypes when they associate with eukaryotic hosts. However, they are at risk of genome degradation and function loss within these associations. Among the diversity of eukaryotic hosts they inhabit, symbiont genome sizes range from nearly unreduced genomes that are similar to their free-living relatives (~3–5 Mb) to highly reduced genomes that are less than 10% of the size of their free-living ancestors (~0.2–0.6 Mb) [1,2]. The degradation process often leaves genes required for the bacterium’s role in the symbiosis, and removes seemingly vital genes, such as those involved in DNA repair and replication [1]. Although genome erosion may be enabled in some cases due to “streamlining” benefits [3], it is clear that the process can become problematic and can ultimately result in symbiont replacement or supplementation [1,4] to accomplish the full repertoire of functions needed in the combined organism. Thus, to fully understand how symbioses evolve, we must first understand the pressures their genomes experience.

Symbiont genome evolution theory predicts that upon host restriction, bacterial genomes begin the steady and inexorable process of decay due to decreased population sizes and homologous recombination rates, which result in inefficacious natural selection [5]. The transmission bottleneck that occurs when symbionts are passed on to host offspring further exacerbates these dynamics by making deleterious mutations more likely to drift to fixation in the next generation [5,6]. Indeed, many endosymbiont taxa exhibit weak purifying selection at the gene level [710]. In the early stages of bacterial symbiont-host associations, deleterious mutations arise on each symbiont chromosome and a portion drift or hitchhike with adaptive mutations to fixation. Subsequently, pseudogenes are lost entirely via deletion [5]. Ultimately, this process is thought to result in an organelle-like genome that is a fraction of the size of its free-living ancestors and has relegated many of its core cellular functions (e.g., DNA replication and repair, cell wall synthesis, etc.) to the host or lost them entirely.

While the degraded genomes of many endosymbionts are consistent with this general theory, such as symbionts of terrestrial sap and xylem-feeding insects [1], a diversity of associations present discrepancies. In particular, most known marine obligate endosymbionts’ genomes are at least one megabase in size and contain diverse coding content [1117], although more reduced representatives have recently been found [18]. Large symbiont genomes in obligate associations are usually interpreted as reflecting the earliest stages of genome degradation [19,20], which is surprising considering the antiquity of many of these associations (e.g., [21,22]). Furthermore, even the vertically transmitted marine symbionts whose phylogenies mirror those of their hosts have only partially degraded genomes [23]. While important genes have been lost in some lineages, such as the recombination and repair gene recA and transversion mismatch-repair gene mutY [24], their loss did not portend an organelle-like fate.

Symbioses in marine environments exhibit significantly more horizontal transmission between hosts than those in terrestrial environments [25], suggesting that symbiont gene flow between hosts may prevent genome decay by enabling high rates of homologous recombination, efficient natural selection, and the maintenance of highly diverse genome contents. This hypothesis has so far not been tested and it is not known whether marine endosymbionts represent the early stages of genome degradation, as implied by the canonical endosymbiont genome theory, or if on-going gene flow and recombination can stall such genome decay over evolutionary time. Furthermore, although the general process of symbiont genome degradation is well-understood in theory (e.g., [1,2,26]), no empirical study has directly evaluated the role of horizontal transmission and recombination in facilitating efficient natural selection and, thereby, the suppression of degradation. Interspecific-population level comparisons are essential for testing these important and long-standing questions [26].

To determine which evolutionary forces prevent marine endosymbiont genomes from degrading despite host restriction, we leveraged both population and comparative genomics of six marine bacterial-animal symbioses from three host taxa that exhibit modes of transmission across the spectrum from strict horizontal transmission (Bathymodiolus mytilid mussels) [2729], to mixed mode transmission (solemyid bivalves) [15,3032], to nearly strict vertical transmission (vesicomyid clams) [23,3335] (see Fig 1A and S1S3 Tables). Each of these three groups evolved independently (see Fig 2) to obligately host either a single intracellular gammaproteobacterial symbiont 16S rRNA phylotype within their gill cells or two or more phylotypes, in the case of some mussels [36,37]. These symbionts provide chemosynthetic carbon fixation, either through sulfide or methane-oxidation, to nutritionally support the association [25]. Hosts are nearly completely dependent on symbiont metabolism [38], and the solemyids have lost the majority of their digestive tracts in response [39]. These associations appear to be obligate for the symbionts as well as the hosts because either the symbionts have never been found living independently, e.g., solemyid [31] and vesicomyid symbionts [40], or they have only been found in the host and surrounding environment, e.g., bathymodiolin symbionts [27,41,42]. Both the vesicomyids and solemyids transmit symbionts to their offspring through allocating tens to hundreds of symbiont cells to their broadcast spawned oocytes [31,33]. While the mechanism of horizontal, host-to-host transfer has not been identified in the Bathymodiolus or solemyid symbionts, signatures of rampant horizontal transmission are evident in the population genetics of both of these groups [15,27,28] and the developmental biology of Bathymodiolus [29].

Fig 1. Nearly-strictly vertically transmitted chemosynthetic endosymbionts exhibit genome erosion despite ongoing horizontal transmission events in their populations.

Fig 1

A) Transmission mode spectrum from strict horizontal transmission to strict vertical transmission, with a diversity of mixed modes, incorporating both strategies, in between. B) Genome sizes from this and previous studies [1113,92] reveal consistent patterns of moderate genome erosion among the vesicomyid symbionts, but not in the other groups with higher rates of horizontal transmission. C) Mitochondrial and symbiont whole genome genealogies are discordant for all groups, indicating that sufficient amounts of horizontal transmission occur in vertically transmitted vesicomyid populations to erode the association between these cytoplasmic genomes. Maximum likelihood cladograms are midpoint rooted, and nodes below 50% bootstrap support are collapsed. Species are color coded by their symbiont transmission mode as in A).

Fig 2. Chemosynthetic bacterial symbionts and their bivalve hosts exhibit ancient divergence times.

Fig 2

A) Maximum likelihood phylogeny inferred from 108 orthologous protein coding genes and the 16S and 23S rRNA genes (outgroup = Alphaproteobacteria; branch labels = bootstrap support fraction) with RelTime divergence date estimates (node bars = 95% confidence intervals). Host-associated bacteria are listed as symbionts of their host species. Bacterial genome sizes are written to the right of the taxon names in the tip labels to highlight trends in genome size across clades. B) Whole mitochondrial Bayesian phylogeny for bivalves (outgroup = Gastropoda; branch labels = posterior probabilities) with divergence dates co-inferred in Beast2 (node bars = 95% highest posterior densities). In both phylogenies, members of vesicomyid, solemyid, and bathymodiolin (both thioautotrophic and methanotrophic) associations are colored yellow, green, and blue, respectively.

This group of marine symbioses presents an ideal system in which to test the impact of transmission mode and homologous recombination rate on bacterial symbiont genome evolution. The wealth of information known about how the vesicomyid, solemyid, and bathymodiolin symbioses function and evolve makes this a powerful evolutionary model. Furthermore, the similarity of these associations to other invertebrate-bacterial associations in the marine environment, e.g., in mode of reproduction (broadcast spawning), phylogeny (Mollusca-Gammaproteobacteria), immunology (innate), ancestral feeding type (filter or deposit feeders), symbiosis function (nutritional), makes this an ideal microcosm for understanding how symbiont population genetics impact the process of symbiont genome degradation. Here, we use this model study system and a powerful population genomic approach (described in S1 Fig and Sections 1-3 of S1 Text), to show that homologous recombination is ongoing even in the most strictly vertically transmitted associations and may enable the maintenance of large and intermediate genome sizes indefinitely.

Results and discussion

Comparative analyses of host mitochondrial and endosymbiont genome genealogies show strong evidence for horizontal transmission in all six populations. The genealogical discordance shown in Fig 1C indicates that horizontal transmission has occurred in the histories of all six of these populations, however, it does not suggest how much because concordance is eroded even by exceedingly low rates of horizontal transmission [43], saturating the signal genealogies can provide. Despite this apparent similarity in transmission mode, the vesicomyid endosymbiont genomes are approximately one half the size of the solemyid or mytilid endosymbiont genomes (Fig 1B), which themselves are approximately consistent with their free-living ancestors. Nonetheless, at 1–1.2 Mb, the partially degraded vesicomyid endosymbiont genomes are still ten times larger than the smallest terrestrial endosymbionts [1]. While the fossil record and previous phylogenetic analyses indicate that the vesicomyid symbionts and clades of solemyid symbionts have been in continuous host association for long periods of time [21,22,44], and thus genome erosion has been prevented, precise divergence dates were needed to confirm this.

Divergence date estimates for hosts and symbionts indicate that the observed patterns in symbiont genome size have been maintained over many millions of years (Fig 2, S2 Fig, and S4S6 Tables). Similar to prior work [21], we estimated that the vesicomyid bivalves evolved from their non-symbiotic ancestors around 73 million years ago (mya) (95% highest posterior density (HPD) = 62.59–76.70 mya; Fig 2B). We estimated a similar divergence date for the vesicomyids’ monophyletic symbionts of around 84 mya (95% CI = 42.37–165.09 mya; Fig 2A), which is remarkable given that their loss of DNA repair genes and reduced selection efficacy has likely increased their substitution rate (and may have been accounted for by using a relaxed local molecular clock). The clade of gammaproteobacteria that contains the vesicomyid symbionts, termed the SUP05 clade, also contains the thiotrophic Bathymodiolus symbionts and free-living bacteria with genomes in the range of 1.17–1.71 Mb (Fig 2A), from which the vesicomyid symbionts are approximately 220 mya (95% CI = 127 - 381mya) diverged. This indicates that the vesicomyid symbiont genomes have eroded 58% at most, depending on the symbiont lineage and the ancestral state. Thus, over tens of millions of years of host association, the vesicomyid symbiont genomes have exhibited high degrees of stasis and have only degraded moderately (e.g., in vesicomyid symbionts with 1 Mb vs. 1.2 Mb genomes).

The solemyids present a similar, but more complicated situation, likely owing to their antiquity, as the hosts first appeared in the fossil record more than 400 mya [22], when the ocean basins had much different connectivity [45]. While host-switching and novel (free-living) symbiont acquisition have certainly occurred in Solemyidae, and our data indicates such an event may have happened after S. velum and S. pervernicosa diverged (Fig 2A), it may be relatively rare across geological time. Whole genome mitochondrial and symbiont phylogenies indicate that the North Atlantic Solemya species S. velum and S. elarraichensis are sisters that diverged around 129 mya (95% HPD = 124–152 mya; Fig 2B), and their symbionts likely co-speciated with them around the same time (170 mya, 95% CI = 89–325 mya; Fig 2A). Divergence and subsequent speciation may have been due to the opening of the Atlantic, which occurred contemporaneously (180–200 mya; [46]). Thus, the vertically transmitted S. velum and S. elarriachensis symbionts have maintained genomes similar in size to their free-living relatives over hundreds of millions of years of host association.

Given the ancient ages of these associations (Fig 2) and their non-negligible rates of horizontal transmission (Fig 1C), a finer scaled exploration of symbiont genetic diversity was necessary to characterize the population-level processes influencing genome erosion. Homologous recombination is an important driver of genetic diversity in many bacterial populations [47,48], and could impact host-associated symbiont populations if diverse genotypes co-occur due to horizontal transmission. Novel genotypes are necessary because recombination among clonal strains, e.g., within a host-restricted clonal population of vertically transmitted endosymbionts, has little impact on haplotypic diversity [49]. To explore the opportunity for recombination among divergent clades of chemosynthetic symbionts, we partitioned symbiont genetic variation to between and within-host variation (e.g., Fig 3A and S7 and S8 Tables). Within vesicomyid symbionts, genetic diversity is strongly subdivided by hosts, and nearly all variation distinguishes host populations (Fig 3B). Conversely, for mytilid and solemyid endosymbionts, hosts have little impact, and two endosymbionts within a host are almost as divergent as two from different hosts (Fig 3B).

Fig 3. Horizontal transmission and recombination introduce novel alleles into symbiont populations.

Fig 3

A) Model of endosymbiont genotype (pink vs. white) distributions under well-mixed high horizontal transmission rates and differentiated, low horizontal transmission rates. B) Horizontally transmitted mytilid (blue) and mixed mode transmitted solemyid (green) symbionts are well mixed among hosts, whereas the nearly strictly vertically transmitted vesicomyid symbionts (yellow) are highly differentiated among hosts. Error bars = 95% confidence intervals from non-parametric bootstrapping. C) Intrahost population folded allele frequency spectra (AFS) are shaped by access to gene flow, which is enabled by horizontal transmission and recombination. D) Recombination rates are significantly higher in the mytilid (blue) and solemyid (green) symbiont genomes compared to the vesicomyid symbiont genomes (yellow). Error bars = 95% confidence intervals.

The distribution of genetic diversity within a single host is even more striking than the pattern between hosts. Within host individuals, Bathymodiolus endosymbionts are exceptionally genetically diverse and the allele frequency spectra are qualitatively similar to expectations for an equilibrium neutrally-evolving population (Fig 3C left and S3 Fig), consistent with the high genetic diversities reported for other bathymodiolin symbionts [50]. Conversely, solemyid endosymbionts maintain intermediate and more variable within-host genetic diversity, consistent with a mixture of vertical and horizontal transmission (Fig 3C middle and S4 Fig). Finally, vesicomyid endosymbiont populations within hosts are virtually devoid of genetic variability (Fig 3C right and S5 Fig). Therefore, despite their literal encapsulation within host cells, mytilid and solemyid symbionts have abundant opportunities to recombine and create fitter chromosomes whereas vesicomyid symbionts must only rarely encounter genetically differentiated individuals.

Although opportunities are limited for vesicomyid endosymbionts, even relatively infrequent homologous recombination events might drive patterns of genome evolution. We therefore developed a theoretical framework of symbiont evolution during mixed transmission modes. Importantly, our model demonstrates that in some conditions these populations can be approximated using a standard Kingman-coalescent and that horizontal transmission is mechanistically linked to observable recombination events between genetically diverse symbiont genomes (Section 2 of S1 Text and S6 and S7 Figs). We then performed extensive coalescent simulations and used a Random Forest-based regression framework to estimate the effective recombination rates for each population (estimated as the population-scaled recombination rate (rho) per site (l); see S9 Table and Materials and Methods). Although our model is clearly an approximation, the results are generally consistent with our prior expectations. The resulting estimated recombination rates are substantially higher in mytilid and solemyid symbionts than in vesicomyid symbionts (Fig 3D and S7 and S10 Tables), indicating that the potential for recombination within hosts is realized in these species. Given that these recombination events are occurring within symbiont populations (θrecombinant = θgenome), our estimate of rho*l is equivalent to r/m from previous studies of bacterial recombination rates (r/m = rho*l*θrecombinantgenome; see [51]). Comparing to r/m values across bacteria and archaea, which range from 0.02 to 63.6 [52], reveals that these symbiont populations have some of the largest effective recombination rates ever reported for bacteria (rho*l from the B. septemdierum symbionts equals 46.3, which is in the 96th percentile of previously measured rates). Despite their lack of many genes normally required for recombination [24], we found evidence for modest rates of recombination within both partially degenerate vesicomyid genomes. This capability may be enabled via “illegitimate” mechanisms, e.g., RecA-independent recombination via slipped-mispairing or single strand annealing [24,53]. For all three symbiont taxa, recombination has a larger impact on genome evolution than mutation (estimated by rho*l/theta in S7 Table), in part due to the relatively low estimates of theta in the symbiont populations. Thus, these bacterial symbionts comprise what might be described as quasi-sexual, rather than clonal, populations.

A fundamental consequence of decreased homologous recombination rates for endosymbionts is that selected mutations cannot be shuffled to form higher fitness chromosomes and remain linked to neutral mutations for longer times. Ultimately, this competition among selected mutations on different haplotypes, termed clonal interference, can drive the fixation of deleterious mutations and reshape genealogies towards long terminal branches and excesses of rare alleles [54] (illustrated in Fig 4A). Similarly, even in the absence of competition among selected haplotypes, recently completed selected sweeps can reshape linked neutral genealogies if recombination is infrequent [55]. Consistent with this theory, we find abundant rare alleles in the vesicomyid symbiont genomes (Fig 4F and 4G and S7 Table; Tajima’s D = -1.98 and -2.03 for C. magnifica and Calyptogena fausta, respectively), but little skew in the allele frequencies of other endosymbiont populations (Fig 4B–4E and S7 Table; D ranges from -2.03 to 1.01). Importantly, it is unlikely that differences in host species demography have driven these differences, e.g., recent population expansions specifically in vesicomyid clams. In fact, we found less allele frequency skew in the mitochondrial genomes than in the vesicomyid symbionts for all species considered, and the strongest negative skew in the allele frequencies in the mitochondrial genome of B. septemdierum from the Lau Basin (D = -1.9, S7 Table). Additionally, relative rates of molecular evolution between symbiont populations follow the expected trend, with dN/dS values of 0.14, 0.096, 0.083 for vesicomyid, solemyid and bathymodiolin genomes, respectively (pairwise Wilcoxon test p-values: bathymodiolin-vesicomyid p = 4.60e-14, solemyid-vesicomyid p = 1.02e-11, and bathymodiolin-solemyid p = 0.0365), consistent with recombination enabling more efficacious purifying selection for sustained periods of time.

Fig 4. Consequences of access to gene flow via horizontal transmission and recombination on the distribution of symbiont genetic diversity between hosts.

Fig 4

A) Diagram showing how beneficial alleles (pink) are linked to deleterious alleles (grey) in populations experiencing strong selection on linked sites versus free recombination, and how these processes are reflected in the underlying population genealogies and allele frequency spectra. B-G) Symbiont genealogies and between host allele frequency spectra (AFS) for each host/symbiont species.

Genome structure stasis is thought to be another hallmark of canonical endosymbiont genome evolution, and many terrestrial endosymbioses have reported static, degenerate genomes [1]. Whole genome alignments reveal that the vesicomyid symbiont genomes are highly syntenic, with few rearrangements, insertions, or deletions; whereas the other symbiont genomes are far more structurally dynamic (Fig 5 and S8 Fig). Given the role of recombination in altering bacterial genome structure ([56]; illustrated in Fig 5A), and the signals of recombination and linked selected sites in the vesicomyid symbionts (Figs 3D, 4F and 4G, respectively), recombinational processes may partially underlie genome erosion, in addition to preventing it. This could proceed in the following way: first, a rare recombination event induces a deletion that drifts to high frequency in the within-host population (other events can also induce deletions, such as strand slippage [57]). With homologous recombination events occurring so rarely in these endosymbiont genomes, natural selection would be unable to efficiently purge deleterious deletions. Although, inversions, which can be highly mutagenic by inverting the translated strand, inducing replication-transcription machinery collisions [58], are nearly absent potentially due to their high fitness costs. If symbionts with chromosomes bearing the deletion are exclusively transmitted to offspring during vertical transmission, then the deletion would be fixed in all subsequent symbionts in that host lineage. Multiple instances of this process would incrementally reduce the size of the symbiont genome.

Fig 5. Genome structure is shaped by horizontal transmission and recombination.

Fig 5

A) Models of recombination-based structural mutation mechanisms. B-D) Whole genome alignments for B) sulfur-oxidizing mytilid, C) solemyid, and D) vesicomyid symbiont genome assemblies with >1 Mb scaffolds.

In contrast, the solemyid symbiont genomes exhibit increasing degrees of structural dynamics with increasing divergence time. Highly divergent solemyid symbionts, such as the S. velum and S. pervernicosa symbionts, exhibit genomes that are as structurally dynamic as strictly horizontally transmitted associations (Fig 5B vs 5C). However, over shorter time scales, such as the duration of time since the S. velum and S. elarraichensis symbionts diverged, structural changes appear to be dominated by insertions and deletions (indels; confirmed for 30 Kb segments of the S. elarrachensis symbiont draft genome in S7 Fig). Rapidly evolving indels have also been reported at the species level for S. velum [15]. Potentially underlying indel dynamics, we found that the solemyid symbionts have far more mobile genetic elements than any of the other symbiont genomes (S11 Table). High mobile element content is consistent with a combination of environmentally-exposed and host-associated periods [59], and mirrors the early stages of symbiosis [19,20], as well as the early stages of eukaryotic asexuality [60]. The mobile elements exhibit homology to different environmental bacteria, implying many independent insertion events (S12 Table).

Given the extreme ages of the solemyid and vesicomyid associations (Fig 2), our data suggest that moderate rates of recombination have allowed their symbiont genomes to maintain functional diversity characteristic of a free-living or moderately reduced genome, respectively. Evidence from the insect endosymbionts experiencing extreme genome size reduction indicates that genome erosion is possible over time frames as short as 5–20 mya (e.g., [6168]). It is plausible that vesicomyid symbionts’ horizontal transmission and recombination rates are at the beginning of the range of values that permit genome decay. Our data indicates that they still undergo horizontal transmission (Fig 1C) and recombine (Fig 3D and S7 Table). Furthermore, signatures of genetic variation consistent with linked selection and sustained intermediate genome sizes indicate that selection is sufficiently efficacious to maintain some functional diversity in these populations, counter to the expectations for the original theory on endosymbiont genome evolution [5,6]. Thus, ample time has passed for these symbiont genomes to erode, but horizontal transmission and recombination have likely prevented it (as depicted in Fig 6).

Fig 6. A conceptual model of the prevention of endosymbiont genome degradation through horizontal transmission and recombination.

Fig 6

Sufficient levels of genetic diversity, which can be introduced via horizontal transmission of symbiont genotypes between hosts and recombination between genotypes in mixed infections, prevents or delays genome degradation by restoring functional versions of mutated or deleted regions. Prevention can continue until recombination capabilities (RecA-dependent and independent) are completely lost, at which point, genetic rescue is no longer possible without wholesale symbiont replacement.

Conclusion

Here we empirically show that symbiont genome sizes and functional diversity are predicted by the rate of gene flow into and among symbiont populations via horizontal transmission and homologous recombination. Although we have only investigated three independently evolved associations, we see this system serving as a microcosm for marine associations more generally, as many other associations exhibit similar biologies (e.g., intracellular, autotrophic, broadcast-spawning, etc. [36]) and all are governed by the same population genetic principles. Amazingly, we found that symbiont gene flow between hosts is ongoing in one of the most intimate marine associations, the vertically transmitted vesicomyids. These results suggest that there is a range of possible intermediate genome degradation states that can be maintained over millions of years with sufficient recombination. Therefore, symbiont genome evolution following host restriction is not a one-way, inescapable process that ends in an organelle-like state as it is commonly presented [2,5,6]. These results validate long-standing but untested theory and suggest that the diversity of symbioses found to exhibit intermediate rates of horizontal transmission and incomplete genome degradation may be undergoing similar population-level processes.

Materials and methods

Samples and genomic data production

Sample collection

We obtained chemosynthetic bivalve samples from hydrothermal vents, cold seeps, and reducing coastal sediments from around the world (S1 Table). Calyptogena magnifica samples were collected from the East Pacific Rise (EPR) hydrothermal vent fields between 1998 and 2004. Calyptogena fausta were collected from the Juan de Fuca (JDF) Ridge hydrothermal vent system in 2004. The single Calyptogena extenta specimen was collected from Monterey Canyon in 1995. Solemya pervernicosa samples were obtained from the Santa Monica sewage outfall in 1992 (as in [15]). Solemya velum were collected from Point Judith, RI, as described in [15]. B. childressi were sampled from the Veatch Canyon cold seep off of New England. B. septemdierum was sampled from the ABE and Tu'i Malila hydrothermal vent sites in Lau Basin. All tissue samples were stored at -80°C until sterile dissection or subsection of previously sterile-dissected gill tissue as described in [15].

DNA extraction and Illumina sequencing

We extracted DNA from gill samples for each host individual sampled using Qiagen DNeasy kits following the manufacturer’s instructions. We quantified DNA concentrations using a Qubit dsDNA kit and normalized each sample to 10 ng/ul for Illumina library preparations. We produced the majority of our Illumina sequencing libraries for each sample using a Tn5-based protocol for tagmentation followed by dual-indexing PCR using HiFi DNA polymerase (Kappa Bioscience) and custom primer sequences (IDT) designed to uniquely label both i5 and i7 indexes for each sample (Tn5 enzyme was expressed and purified in-house). Indexed samples were pooled and sequenced on single lanes of a Hiseq4000. We sequenced a total of four lanes of Hiseq4000 paired-end 150 bp sequencing across the entire study. Additionally, we obtained a subset of samples for C. magnifica using genomic methods from our previous work ([15], S1 Table). We also used a dataset we have previously collected for S. velum, specifically the population from Point Judith, Rhode Island ([15], S1 Table). The specific library preparation methods and number of read pairs obtained for each sample included in this work are listed in S1 Table.

Nanopore sequencing

For each de novo genome assembly in this work, we selected a representative from each host population based on DNA quality as determined using an Agilent Tapestation and DNA concentration based on qubit readings. We sequenced each sample on a single minion flow cell using the ligation-based 1D chemistry, SQK-LSK109 kit per ONT instructions with minor adjustments. The end-repair reaction was incubated for 30 minutes each at 20°C and 65°C and the ligation reaction was performed for 30 minutes instead of the recommended 10 minutes. Read counts obtained and mean read lengths are available in S1 Table. We performed basecalling using the Albacore basecaller v2.0.1 and we discarded the subset of reads whose mean quality score was less than 7. These are the set marked nominally as “failed” by the basecaller software.

De novo symbiont genome assembly

Reference genomes for the B. septemdierum, B. childressi, S. pervernicosa, and C. fausta symbionts were assembled using combined Nanopore and Illumina reads. First, we assembled the Nanopore reads using the long-read assembly program wtdbg2 [69] using the “ont” presets option and setting the parameter -k to 15. Then, we performed two rounds of reference genome improvement by aligning Illumina sequencing reads from the same individual to the resulting unfiltered assembly and polishing with the Pilon software package [70]. We used BWA-MEM [71] to align Illumina reads in each subsequent polishing round.

We assembled the C. extenta symbiont genome from an Illumina library prepared for the single sampled individual using IDBA [72] and SPAdes [73]. While both assemblies were highly contiguous (N50 = 596007 and 604961 bp, respectively), the SPAdes assembly was able to merge two contigs that were split in the IDBA assembly, so the two contig SPAdes assembly was used for downstream analyses. Comparisons of synteny demonstrated that this join was found in other Vesicomyid endosymbiont genomes suggesting it is correct (see below).

Because read mixtures include host genomic DNA, mitochondrial DNA, and genomic DNA from other bacterial species, we then rigorously filtered the resulting contigs to extract only high confidence contigs contributed by bacteria of the study species. To identify symbiont contigs, we called ORFs with Prodigal v2.60 [74] and annotated coding sequences with BLAST [75] as described in [15] (NCBI nr, TrEMBL, and UniProt database accessed on April 7, 2019). We annotated ribosomal RNAs with RNAmmer [76] and transfer RNAs with tRNAscan [77]. Using the taxonomic information encoded in the annotation, we identified contigs that were confidently of symbiont origin. Then, using these contigs, we filtered the remaining contigs by GC content, read coverage, and coding density using custom scripts. Finally, we evaluated the quality of the assemblies with CheckM [78] and by testing for the presence of core bacterial phylogenetic markers [79] (see S2 Table).

Host mitochondrial genome assembly

Mitochondrial genomes were assembled from Nanopore reads, which were subsequently corrected with Illumina data, as described above, or they were assembled from Illumina reads directly. As the different samples contained different mitochondrial coverage, higher short read coverage was often better than lower long read coverage for recovering these genomes. We assembled mitochondrial genomes for C. fausta and B. septemdierum with IDBA [72] using Illumina data from two of the highest depth-of-coverage samples (C. fausta 31 and B. septemdierum 231, respectively). The complexity of the B. septemdierum data prevented IDBA from finishing within a week, so we first removed low coverage nuclear kmers (<10x) with Quake [80]. The C. extenta mitochondrial genome was assembled along with the symbiont genome using SPAdes [73], as described above. We were able to use the Nanopore-based assembly from Bathymodiolus childressi for the mitochondrial genome. Lastly, the complete mitochondrial genome for S. pervernicosa was available from [15], so we did not reassemble it here.

After assembly, we identified the mitochondrial scaffold by blasting the full set of scaffolds against a database containing the currently available set of 19 bivalve mitochondrial genomes. Then, we annotated the mitochondrial genome with MITOS [81]. For mitochondrial genomes lacking conserved genes, we repeated genome assembly and mitochondrial genome identification and annotation with a different sample to verify we obtained the full sequence. See Section 1 of S1 Text for a description of the host species identification verification process.

Short read alignment

After producing endosymbiont and host mitochondrial genome assemblies for each host/endosymbiont species, we aligned short read Illumina data from each individual to a reference genome consisting of both of these genomes. Genomes assembled previously for the C. magnifica symbiont ([11], accession NC_008610.1) and mitochondrian ([82], accession NC_028724.1), and the S. velum symbiont ([13], accession NZ_JRAA00000000.1) and mitochondrian ([83], accession NC_017612.1) were used as references for these populations. We used the BWA mem software package [71], and we then sorted and removed duplicate reads using the samtools software package [84]. After this, we performed indel realignment for each sample separately using the “IndelRealigner” function within the Genome Analysis Toolkit (GATK) software package [85].

Genotyping and variant filtration for each host individual

We called consensus genotypes for each individual jointly using the GATK “UnifiedGenotyper” option and we ran the program with otherwise default parameters except we required that it output all sites rather than just all variable positions. We filtered variant sites using the vcftools software package [86] largely following the GATK best practices as we have done in our previous work [15]. Briefly, we required that each site have a minimum quality/depth ratio of 2, a maximum Fisher’s strand value of 60, a minimum nominal genotype quality of 20 and a maximum number of reads with mapping quality zero at a putatively variant site of 5. For analyses of within host individual variation for fixation index (Fst) calculations, we also obtained a multiple pileup file using samtools and filtered sites that were not retained after applying these filters. See Section 1 of S1 Text for an estimate of the consensus genotyping error rate.

Within-host diversity analysis

We called within-host SNP and indel variants for endosymbionts and mitochondria using the method from [87]. Briefly, we created mpileup files from BWA bam alignment files for all individuals from each host species using SAMTools [84]. Then, we called variants and calculated pairwise diversity using the perl script from [87], which only considers sites within one standard deviation of the average genome coverage, filters SNPs around indels, and requires an alternate allele count in excess of the cumulative binomial probability of sequencing error at that site. As very closely related sister taxa have not been sampled for most of these bacterial genomes, ancestral/derived alleles could not be identified and we could not plot unfolded allele frequency spectra. Instead, folded allele frequency spectra were calculated for minor alleles and plotted in R.

No heteroplasmy was detected within the mitochondrial within-host populations (see S7 Table), suggesting that these bivalves do not experience double uniparental mitochondrial inheritance. This is important given our expectations regarding mitochondrial-symbiont co-divergence under strict vertical transmission.

Genome analyses

Population genealogy inference

We produced multiple fasta sequence files for each population for the host mitochondrial genomes and for the concatenated symbiont genomes from the set of filtered consensus genotype calls. We then used the phylogenetic software package RAxML [88] using a GTR+G model and 1,000 bootstrap replicates to estimate the phylogenetic relationships among samples and to quantify uncertainty in our phylogenetic relationships. Using FigTree, we rooted the trees by their midpoints and created cladograms for topological comparisons.

Analysis of polymorphic and recombinant sites

Using the fasta files described above, we filtered sites to only retain biallelic SNPs with a minimum genotype quality of 10. Without indels, these resequenced genomes were already aligned. Then we used the aligned SNP data to calculate Waterson’s theta [89], pi [90], and the proportion of pairwise sites where all 4-gametes, i.e., all pairwise combinations of alleles, are represented. We then binned the 4-gamete sites by the distances between alleles, with bins at 1e1, 1e2, 1e3, 1e4, 1e5, and 1e6 bp, for model fitting (described below).

Whole genome structural alignment

We generated whole genome alignment plots by first aligning bacterial genome assemblies with MUMmer 3.23 [91], using the nucmer algorithm and default parameters. In addition to the mb-scale genomes we assembled and referenced above, we obtained mb-scale genomes for Bathymodiolus septemdierum str. Myojin knoll ([42], accession GCA_001547755.1), Bathymodiolus thermophilus str. EPR9N ([92], accession GCF_003711265.1), and Vesicomyosicus okutanii ([12], accession NC_009465.1) from NCBI for alignment. The nucmer output was converted into the BTAB format with the MUMmer tool show-coords and was then visualized using a custom Python script. When necessary, some scaffolds in the bacterial assemblies were split into two parts in order to convert a circular genome into a linear alignment plot.

We used the whole genome aligner progressiveMauve [93] to compare genome synteny on the 10s of Kb scale between the S. velum and S. elarraichensis symbionts. First, we reordered the contigs comprising the S. elarraichensis symbiont draft genome [15] by the S. velum symbiont assembly [13] with the reorder contig function in progressiveMauve. Then, we aligned the S. velum symbiont genome pairwise against the S. elarraichensis symbiont’s reordered contigs and the S. pervernicosa symbiont genome we assembled. We plotted the alignment backbone files in the R package genoPlotR [94].

Mobile element analysis

We identified mobile elements in the endosymbiont genome sequences by BLAST. First, we generated BLAST database files with the makeblastdb command from the ACLAME [95] and ICEberg [96] nucleotide and amino acid databases of transposable, viral, and conjugative elements. Next, we used blastp and blastn to compare endosymbiont amino acid sequences and full genome sequences, respectively, to these databases (cutoff values: minimum alignment length of 50 nucleotides or 50% amino acid query coverage, 90% identity, and e-value 1e-6). Overlapping hits were consolidated into a single mobile element-containing region (S12 Table).

Ortholog identification

We identified putative orthologous sequences among sets of bacterial genomes by a reciprocal best BLAST approach. To do this, we first performed pairwise blasts between each pair of genomes’ coding sequences with blastn (-best_hit_overhang 0.1 -best_hit_score_edge 0.1 -evalue 1e-6), alternating each sequence as the query/subject. We parsed these results to only retain the best hits with >50% identity and >100 bp alignment lengths. Then, using a custom perl script, we compared hits between all pairs to identify genes with identical reciprocal best hits among all taxa each homologous gene was detected in. We used the resulting matrix of these reciprocal best hits to extract the coding sequences for each ortholog for each species from the genome fasta files for downstream analysis.

dN/dS analysis

To evaluate the impact of homologous recombination on patterns of natural selection at the molecular level over long periods of time we computed the average fixation rate among endosymbiont lineages at nonsynonymous and synonymous sites (dN/dS). This ratio of values is an approximate measure of the strength of purifying selection under the assumption that most nonsynonymous substitutions are deleterious. For all genes where a single ortholog was found to be shared among all symbiont lineages we began by producing codon aware alignments using MASCE [97]. Then, we compared each orthologous alignment for pairs of symbiont lineages within each group (solemyid, vesicomyid, and bathymodiolin) to estimate dN and dS using the codeml package in the PAML v4.9 framework [98]. We excluded all comparisons for which dS < 0.05 or dS > 2, as values that exceed this range are often thought to yield unreliable estimates of rates of molecular evolution due to low statistical power and saturated substitutions, respectively. We then compared the distributions of dN/dS for each symbiont group comparison using a Wilcoxon test.

Divergence dating

Taxon selection

To construct dated phylogenies for hosts and symbionts, we downloaded related genomes from NCBI. For the host divergence analysis, all of the bivalve mitochondrial genomes available as of early 2020 and four gastropod mitochondrial genomes were downloaded to serve as ingroups and outgroups, respectively (35 total taxa: 31 bivalves and four gastropods; see S4 Table). For the symbiont divergence analysis, bacterial genomes were identified for inclusion in the analysis by BLAST [75]. While residing in a relatively constrained clade of proteobacteria, these chemosynthetic symbionts do not form a monophyletic clade, have free-living relatives, are basal to more derived groups in Gammaproteobacteria, and are currently taxonomically unclassified, so it was necessary to fish out related genomes by identifying sequence homology. To do this, we aligned the nucleotide coding sequences from each one of the seven symbiont genomes we sequenced and/or analyzed against NCBI’s Prokaryotic RefSeq Genomes database with blastn (-best_hit_overhang 0.1 -best_hit_score_edge 0.1 -evalue 1e-6). Based upon the diversity of hits across symbionts and genomes, we selected the top three best hits to each gene as taxa to include in the full genome divergence analysis (59 total taxa: 58 gammaproteobacteria and one alphaproteobacterium outgroup; see S5 Table).

Multiple sequence alignment

As bivalve mitochondrial genomes exhibit notoriously diverse structural arrangements [99], we used the whole rearrangement-aware genome aligner progressiveMauve [93] to align the molluscan mitochondrial genomes. These alignments were manually inspected and converted to fasta format in Geneious Prime (version 11.0.6+10) [100].

We identified and aligned orthologous proteins among these diverse bacterial genomes with bcgTree [101], then we back-translated the resulting amino acid alignments to nucleic acids with RevTrans [102]. As bcgTree only includes protein-coding genes, we manually extracted and aligned 16S and 23S ribosomal RNA sequences for these taxa with Mafft (using the accurate mafft-linsi setting) [103]. Although recombination clearly occurs frequently in the solemyid and mytilid symbiont populations, we decided against removing recombinant sites because doing so may exacerbate recombination-induced artifacts [104]. Finally, we concatenated these alignments with the nucleotide alignments from bgcTree/RevTrans with a custom perl script and inspected them in Geneious Prime.

Phylogenetic inference and divergence dating

We first inferred maximum likelihood phylogenies with RAxML (version 8.2.1, with parameters: f a -m GTRGAMMA -N 1000) [88] to verify that the taxa selected were able to resolve the relationships among hosts and among symbionts and free-living bacteria. Both mitochondrial and symbiont phylogenies were well-resolved (S2 Fig).

We inferred Bayesian phylogenies and dated node divergences for host mitochondria in Beast2 [105]. After several rounds of parameter testing to ascertain the speciation model and calibration date distribution that best fit the data (see Section 1 of S1 Text), we selected the Yule model of speciation, with a gamma distributed Hasegawa, Kishino, and Yano (HKY) model of substitution and a relaxed local molecular clock, and we calibrated dates to the base of the ingroup, Bivalvia. We used the fossil-based minimum appearance date for bivalves of 520 million years (first appearance estimated in Fossilworks [106] from fossil data in [107,108]). MCMC chains were run in duplicate until posterior probability convergence, around 8e8 steps for mitochondria. We also performed independent divergence date estimations in RelTime [109,110] and PATHd8 [111] to compare to the Beast2 results (see Section 1 of S1 Text and S6 Table).

Given the consistency in RelTime and Beast2 estimates for mitochondria (S6 Table) and the unreasonably long run times necessary (several weeks) for symbiont dated phylogenies to reach posterior convergence in Beast2, we estimated symbiont divergence times in RelTime. We used the previously estimated divergence date for Gammaproteobacteria of 1.89 billion years (based on calibration to the cyanobacterial-caused atmospheric oxygenation event [112]) with a log-normal distribution to calibrate a relaxed local clock, using a gamma-distributed Tamura-Nei model of substitution, and allowing for invariant sites. All trees were plotted in FigTree.

Symbiont species descriptions

Using the genomic, phylogenetic, and divergence data we generated above, we diagnosed and described the six symbiont species sequenced in this study. These classifications will be helpful in future investigations and discussions of symbiont function and diversity. Diagnoses of symbiont genera and descriptions of symbiont species are described in the Section 3 of S1 Text and listed in S3 Table.

Parameter estimation via approximate Bayesian computation

Simulation setup

To simultaneously estimate the rates of horizontal transmission, effective homologous recombination rates, and the recombinant tract length, we used an approximate Bayesian computation approach. Here, we define the population-scaled per-base pair recombination rate, rho, to be equal to two times the effective population size times the per-base pair rate of gene conversion (rho = 2*Ner). The recombination tract length, l, is defined as the length of the gene conversion segment in base pairs. We used the bacterial sequential Markovian coalescent simulation framework, FastSimBac [113], to simulate neutral coalescence across a range of input parameters (see Section 2 of S1 Text for model proof). We drew the effective mutation rate from a log-uniform (3e-5,1e-2) distribution, the effective recombination rate from a log-uniform (1e-6,1e-2) distribution and the recombinant tract length from a uniform (1,1e5) distribution. Because the clonal frame cannot be inferred for the Bathymodiolus and Solemya symbiont populations, presumably due to the high levels of recombination relative to other bacteria, we did not supply the program with a fixed precomputed clonal frame for any simulations. In total we performed 100,000 simulations and we used subsets from each simulation to obtain summary statistics to train the variable sample size simulations.

Summary statistics

We selected a set of summary statistics that each incorporate some feature of the overall diversity (relevant for theta), and the overall effective recombination rate per site (rho*l). Specifically, we computed two estimators of theta, Waterson’s theta [89], and pi [90], and we included the proportion of pairs of non-singleton SNPs at various genomic distances where all four possible combinations of alleles are observed in our sample. We placed the divisions between distance bins for pairwise comparisons of sites at 1e1, 1e2, 1e3, 1e4, 1e5, 1e6 base pairs.

Model fitting via random forest regression

We use the scikit-learn package to perform random forest regression to obtain estimates of each parameter for each endosymbiont population using the 4-gamete sites identified above and a custom Python script available on Github. First, we confirmed that our summary statistics are sufficient to accurately fit our desired population parameters using out-of-bag score during model training and for each population sample size (S9 Table). Although the score is often slightly lower for smaller sample sizes, this approach performs sufficiently well and consistently across samples for our applications here. We additionally obtained confidence intervals for each parameter estimate using the forestsci package [114]. It should be noted that our simulations assume an equilibrium population. If this assumption is violated in a subset of the taxa that we examined, it might affect our parameter estimates. Nonetheless, it is unlikely that the large-scale differences in estimated parameters that we observe among groups, which are consistent with prior expectations, are entirely attributable to this potential bias.

Method validation on existing datasets

In light of the relatively high recombination rates that we estimate, it is valuable to confirm that our approach for estimating rho and theta performs as expected. We therefore applied our method to the Bacillus cereus dataset [115] that has been studied in similar contexts using several related approaches [51,113,115]. In prior work with this dataset, estimates of the total impact of recombination, rho*l/theta, have varied somewhat, from approximately 3.7 [113] to 35.9 [115] and 229 [51]. Using our method, we obtain a value intermediate to these at 17.1, which indicates a moderate impact of recombination on genome evolution. This result suggests that our method is reliable and does not substantially inflate recombination rate estimates.

Read-backed phasing to confirm recombination estimates

Because we used consensus symbiont genomes during model fitting, it could be possible that the high estimated rates of recombination in solemyid and bathymodiolin samples are an artefact of differential haplotype coverage across the genomes of genetically diverse symbiont chromosomes. We therefore sought to confirm our recombination rate estimates via comparing the rate of occurrence of all four possible configurations of two proximal alleles using read-backed phasing. Because each read-pair must ultimately derive from a single DNA fragment, when two alleles are observed on the same read or read pair, in the absence of errors, they must reflect an allelic combination that’s observed within a single bacterial chromosome.

We therefore analyzed in aggregate all reads that overlapped two or more consensus alleles for each population and computed the fraction of all four possible sampling configurations. More specifically, we computed the proportion of reads sampled from across all reads in all individuals that contained alleles, AB, Ab, aB, and ab, for two adjacent biallelic sites with alleles A/a and B/b. To reduce the impact of sequencing errors, we recorded a site as containing all four possible configurations when all configurations were present at proportion greater than 0.05 in the total set of read pairs. We further limited comparisons to alleles where we could obtain at least 100 observations of both sites on single read pairs. We excluded the populations of vesicomyid endosymbionts from this analysis because too few polymorphic sites were present within the distances spanned by individual read pairs to confidently infer the frequencies of 4D sites. However, because samples from these populations contain virtually no within-host variation, we have little reason to doubt the accuracy of the consensus genotypes resulting from differential coverage of genetically diverse bacterial haplotypes.

Because we are sampling a much larger number of lineages than we did in analyzing the consensus genome sequences, we would expect if anything is different that we observe higher rates of sites where all four possible allele configurations are present. This is precisely what we found. Specifically, we observe higher proportions of pairs of sites where all four possible allelic combinations are represented at each distance considered and in each population in the read-backed dataset than in the consensus chromosomes (S10 Table). Furthermore, because we have placed conservative cutoffs on the proportions of read pairs required to consider a pair of sites as containing all four alleles, these values are likely underestimates of the true rates. Our observed rates of four-gamete test failures is therefore consistent with our analyses of consensus genome sequences and confirms that recombination must be common within these endosymbiont populations.

Supporting information

S1 Fig. Overview of the genomic data production and analysis steps used to study the population genomic processes influencing endosymbiont genome evolution.

(TIF)

S2 Fig. Maximum likelihood phylogenies for host mitochondria (top) and symbionts (bottom).

Groups of chemosynthetic associations are colored as in Fig 1: yellow = vesicomyids, green = solemyids, and blue = bathymodiolids. Mitochondrial and symbiont trees are rooted by gastropod and alphaproteobacterial outgroups, respectively. Scale bar = substitutions per site. Bootstrap support values indicated at nodes.

(TIF)

S3 Fig. Within-host symbiont folded allele frequency spectra for all B. septemdierum and B. childressi intrahost samples with more than 50x and 45x Illumina sequencing coverage, respectively (see S1 Table for coverages and S8 Table for diversity statistics).

(TIF)

S4 Fig. Within-host symbiont folded allele frequency spectra for all Solemya velum and Solemya pervernicosa intrahost samples with more than 50x Illumina sequencing coverage (see S1 Table for coverages and S8 Table for diversity statistics).

(TIF)

S5 Fig. Within-host folded allele frequency spectra for all Calyptogena fausta and Calyptogena magnifica intrahost samples with at least 50x Illumina sequencing coverage (see S1 Table for coverages and S8 Table for diversity statistics).

(TIF)

S6 Fig. Endosymbiont inheritance modes.

Our generalized coalescent model of endosymbiont inheritance includes symbiont transmission modes ranging from strict horizontal transmission to strict vertical transmission, with mixed modes, exhibiting both horizontal and vertical strategies. The host populations (grey) undergo Wright-Fisher reproduction. Endosymbiont lineages (red and blue) either switch between host lineages or are inherited, depending on the transmission mode, until they coalesce in the same host lineage (purple).

(TIF)

S7 Fig. The observed number of pairwise differences across a range of parameters under the endosymbiont population model described above.

Each distribution is 100 replicates with varying NH, H, and NS. The expectation following Equation 9 above is plotted as a red line and differs by less than 2 segregating sites from the observed mean for all cases investigated here.

(TIF)

S8 Fig. Local alignments suggest that few rearrangements have occurred between the S. velum and S. elarraichensis symbiont genomes.

S. elarraichensis symbiont is the closest known relative of the S. velum symbiont, however material is exceedingly hard to obtain for this association, which occurs at a mud volcano at approximately 500–1000 m depth, and only a fragmented draft genome assembly was available. However, even these relatively short range segments reveal complete synteny (left). In comparison, over the same genomic distances, many rearrangements are evident between S. velum and S. pervernicosa (right), with the minority of segments retaining synteny.

(TIF)

S1 Table. Sample, sequencing library, and mapping coverage information.

The second set of coverages listed for C. fausta apply to the libraries used for the intra-host analysis.

(XLSX)

S2 Table. De novo reference assemblies were assembled with Nanopore reads and polished with Illumina data.

Illumina reads were used for individual sample genotype calling. There were no gaps (Ns) in any of the assemblies. The percent complete measure reflects how many of the 34 "essential genes" (see Materials and Methods) were found in the assembled genomes.

(XLSX)

S3 Table. Symbiont species named in this study and named previously.

See S3 Supporting Text for full diagnoses and descriptions.

(XLSX)

S4 Table. Taxa and accession numbers used in the mitochondrial genome phylogenetic analysis and divergence dating.

(XLSX)

S5 Table. Taxa and accession numbers used in the bacterial whole genome phylogenetic analysis and divergence dating.

(XLSX)

S6 Table. Divergence date estimates from different Beast2, TimeTree, and PATHd8 runs with different parameter values.

(XLSX)

S7 Table. Between-host symbiont population statistics calculated from consensus symbiont and mitochondrial genome sequences.

Random Forest (RF) theta and log10(rho*l) estimates were inferred by fitting genome-wide values of pi, Watterson’s Theta, and 4-gamete sites to values generated in coalescent simulations.

(XLSX)

S8 Table. Within-host symbiont and mitochondrial genetic diversity statistics.

Mapping coverages in S1 Table.

(XLSX)

S9 Table. Out-of-bag (oob) scores for random-forest models for each parameter of interest, rho*l and theta, and for each sample size of endosymbiont individuals considered.

Oob scores indicate how often the trained model is able to predict known values, with perfect prediction equal to one.

(XLSX)

S10 Table. Proportion of within-host variant sites that pass the 4-gamete test for recombination based upon read and read pair data over the given genomic intervals (constrained by Illumina library insert sizes).

(XLSX)

S11 Table. Comparative symbiont genome statistics and mobile element (ME) content.

MEs were identified as regions within the symbiont genomes with high sequence identity to elements in insertion sequence, phage, and integrative conjugative element databases.

(XLSX)

S12 Table. Full list of ICEberg and ACLAME database mobile element hits with > = 90% sequence identity to endosymbiont genomic regions and genes.

(XLSX)

S1 Text. Supplemental text sections 1–3 for Forever young bacterial symbiont genomes.

(PDF)

Acknowledgments

We thank two anonymous reviewers and Emma George for their helpful comments on the manuscript and Xavier Didelot for kindly providing the Bacillus cereus validation dataset. For Calyptogena freezer samples collected previously, we thank Colleen Cavanaugh and Peter Girguis. For their assistance in collecting Bathymodiolus samples from the Lau Basin and Veatch Canyon, respectively, we thank the crews of the R/V Falkor and ROV ROPOS and the R/V Atlantis and HOV Alvin. We thank Peter Wilton for feedback on the symbiont coalescent model derivation.

Data Availability

Data and genome assemblies generated in this study are available through NCBI BioProject number PRJNA562081 (BioSample numbers listed in S1 Table). Code written and used in our analyses is available from https://github.com/shelbirussell/ForeverYoungGenomes_Russell-et-al. Underlying numerical data for all graphs and summary statistics are available as Supporting Information.

Funding Statement

This work was supported by UC Santa Cruz, Harvard University, the Alfred P. Sloan Foundation (to RCD; sloan.org), and the NIH (R35GM128932 to RCD; nih.gov). Funding for Lau Basin collections was provided by the Schmidt Ocean Institute and NSF (OCE-1819530 to RB; nsf.gov), Funding for the Veatch Canyon collection was provided via a UNOLS Early Career Training Cruise Program funded by the NSF (OCE-1641453, OCE-1638805, OCE-1214335, OCE- 1655587, and OCE-1649756; nsf.gov) and the ONR (N00014–15-1–2583; onr.navy.mil). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Moran NA, Bennett GM. The Tiniest Tiny Genomes. Annu Rev Microbiol. 2014;68: 195–215. 10.1146/annurev-micro-091213-112901 [DOI] [PubMed] [Google Scholar]
  • 2.Lo W-S, Huang Y-Y, Kuo C-H. Winding paths to simplicity: genome evolution in facultative insect symbionts. Lai E-M, editor. FEMS Microbiol Rev. 2016;40: 855–874. 10.1093/femsre/fuw028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Giovannoni SJ, Cameron Thrash J, Temperton B. Implications of streamlining theory for microbial ecology. ISME J. 2014;8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Meseguer AS, Manzano-Marín A, Coeur d’Acier A, Clamens A-L, Godefroid M, Jousselin E. Buchnera has changed flatmate but the repeated replacement of co-obligate symbionts is not associated with the ecological expansions of their aphid hosts. Mol Ecol. 2017;26: 2363–2378. 10.1111/mec.13910 [DOI] [PubMed] [Google Scholar]
  • 5.McCutcheon JP, Moran NA. Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol. 2011. [cited 3 Jan 2017]. 10.1038/nrmicro2670 [DOI] [PubMed] [Google Scholar]
  • 6.Toft C, Andersson SGE. Evolutionary microbial genomics: insights into bacterial host adaptation. Nat Rev Genet. 2010;11: 465–475. 10.1038/nrg2798 [DOI] [PubMed] [Google Scholar]
  • 7.Lambert JD, Moran NA. Deleterious mutations destabilize ribosomal RNA in endosymbiotic bacteria. Proc Natl Acad Sci. 1998;95: 4458–4462. 10.1073/pnas.95.8.4458 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kuwahara H, Takaki Y, Yoshida T, Shimamura S, Takishita K, Reimer JD, et al. Reductive genome evolution in chemoautotrophic intracellular symbionts of deep-sea Calyptogena clams. Extremophiles. 2008;12: 365–374. 10.1007/s00792-008-0141-2 [DOI] [PubMed] [Google Scholar]
  • 9.Herbeck JT, Funk DJ, Degnan PH, Wernegreen JJ. A Conservative Test of Genetic Drift in the Endosymbiotic Bacterium Buchnera: Slightly Deleterious Mutations in the Chaperonin groEL. Genetics. 2003; 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Shapiro BJ, Alm E. The slow:fast substitution ratio reveals changing patterns of natural selection in γ-proteobacterial genomes. ISME J. 2009;3: 1180–1192. 10.1038/ismej.2009.51 [DOI] [PubMed] [Google Scholar]
  • 11.Newton ILG, Woyke T, Auchtung TA, Dilly GF, Dutton RJ, Fisher MC, et al. The Calyptogena magnifica Chemoautotrophic Symbiont Genome. Science. 2007;315: 998–1000. 10.1126/science.1138438 [DOI] [PubMed] [Google Scholar]
  • 12.Kuwahara H, Yoshida T, Takaki Y, Shimamura S, Nishi S, Harada M, et al. Reduced Genome of the Thioautotrophic Intracellular Symbiont in a Deep-Sea Clam, Calyptogena okutanii. Curr Biol. 2007;17: 881–886. 10.1016/j.cub.2007.04.039 [DOI] [PubMed] [Google Scholar]
  • 13.Dmytrenko O, Russell SL, Loo WT, Fontanez KM, Liao L, Roeselers G, et al. The genome of the intracellular bacterium of the coastal bivalve, Solemya velum: a blueprint for thriving in and out of symbiosis. BMC Genomics. 2014;15 10.1186/1471-2164-15-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Miller IJ, Vanee N, Fong SS, Lim-Fong GE, Kwan JC. Lack of Overt Genome Reduction in the Bryostatin-Producing Bryozoan Symbiont “Candidatus Endobugula sertula.” Drake HL, editor. Appl Environ Microbiol. 2016;82: 6573–6583. 10.1128/AEM.01800-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Russell SL, Corbett-Detig RB, Cavanaugh CM. Mixed transmission modes and dynamic genome evolution in an obligate animal–bacterial symbiosis. ISME J. 2017; 1359–1371. 10.1038/ismej.2017.10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hendry TA, Freed LL, Fader D, Fenolio D, Sutton TT, Lopez JV. Ongoing Transposon-Mediated Genome Reduction in the Luminous Bacterial Symbionts of Deep-Sea Ceratioid Anglerfishes. Moran NA, editor. mBio. 2018;9: e01033–18, /mbio/9/3/mBio.01033-18.atom. 10.1128/mBio.01033-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Jäckle O, Seah BKB, Tietjen M, Leisch N, Liebeke M, Kleiner M, et al. Chemosynthetic symbiont with a drastically reduced genome serves as primary energy storage in the marine flatworm Paracatenula. Proc Natl Acad Sci. 2019; 201818995. 10.1073/pnas.1818995116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.George EE, Husnik F, Tashyreva D, Prokopchuk G, Horák A, Kwong WK, et al. Highly Reduced Genomes of Protist Endosymbionts Show Evolutionary Convergence. Curr Biol. 2020;30: 925–933.e3. 10.1016/j.cub.2019.12.070 [DOI] [PubMed] [Google Scholar]
  • 19.Ran L, Larsson J, Vigil-Stenman T, Nylander JAA, Ininbergs K, Zheng W-W, et al. Genome Erosion in a Nitrogen-Fixing Vertically Transmitted Endosymbiotic Multicellular Cyanobacterium. Ahmed N, editor. PLoS ONE. 2010;5: e11486 10.1371/journal.pone.0011486 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Oakeson KF, Gil R, Clayton AL, Dunn DM, von Niederhausern AC, Hamil C, et al. Genome Degeneration and Adaptation in a Nascent Stage of Symbiosis. Genome Biol Evol. 2014;6: 76–93. 10.1093/gbe/evt210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Johnson SB, Krylova EM, Audzijonyte A, Sahling H, Vrijenhoek RC. Phylogeny and origins of chemosynthetic vesicomyid clams. Syst Biodivers. 2017;15: 346–360. 10.1080/14772000.2016.1252438 [DOI] [Google Scholar]
  • 22.Sharma PP, Zardus JD, Boyle EE, González VL, Jennings RM, McIntyre E, et al. Into the deep: A phylogenetic approach to the bivalve subclass Protobranchia. Mol Phylogenet Evol. 2013;69: 188–204. 10.1016/j.ympev.2013.05.018 [DOI] [PubMed] [Google Scholar]
  • 23.Ozawa G, Shimamura S, Takaki Y, Takishita K, Ikuta T, Barry JP, et al. Ancient occasional host switching of maternally transmitted bacterial symbionts of chemosynthetic vesicomyid clams. Genome Biol Evol. 2017;9: 2226–2236. 10.1093/gbe/evx166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kuwahara H, Takaki Y, Shimamura S, Yoshida T, Maeda T, Kunieda T, et al. Loss of genes for DNA recombination and repair in the reductive genome evolution of thioautotrophic symbionts of Calyptogena clams. BMC Evol Biol. 2011;11: 285 10.1186/1471-2148-11-285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Russell SL. Transmission mode is associated with environment type and taxa across bacteria-eukaryote symbioses: a systematic review and meta-analysis. FEMS Microbiol Lett. 2019; fnz013. [DOI] [PubMed] [Google Scholar]
  • 26.Wernegreen JJ. Endosymbiont evolution: predictions from theory and surprises from genomes: Endosymbiont genome evolution. Ann N Y Acad Sci. 2015;1360: 16–35. 10.1111/nyas.12740 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Fontanez KM, Cavanaugh CM. Evidence for horizontal transmission from multilocus phylogeny of deep-sea mussel (Mytilidae) symbionts: Horizontal transmission of mussel symbionts. Environ Microbiol. 2014;16: 3608–3621. 10.1111/1462-2920.12379 [DOI] [PubMed] [Google Scholar]
  • 28.Won Y-J, Hallam SJ, O’Mullan GD, Pan IL, Buck KR, Vrijenhoek RC. Environmental acquisition of thiotrophic endosymbionts by deep-sea mussels of the genus Bathymodiolus. Appl Environ Microbiol. 2003;69: 6785–6792. 10.1128/aem.69.11.6785-6792.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wentrup C, Wendeberg A, Huang JY, Borowski C, Dubilier N. Shift from widespread symbiont infection of host tissues to specific colonization of gills in juvenile deep-sea mussels. ISME J. 2013;7: 1244–1247. 10.1038/ismej.2013.5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Gustafson RG, Reid RG. Association of bacteria with larvae of the gutless protobranch bivalve Solemya reidi (Cryptodonta: Solemyidae). Mar Biol. 1988;97: 389–401. [Google Scholar]
  • 31.Russell SL, McCartney E, Cavanaugh CM. Transmission strategies in a chemosynthetic symbiosis: detection and quantification of symbionts in host tissues and their environment. Proc R Soc B Biol Sci. 2018;285: 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Krueger DM, Gustafson RG, Cavanaugh CM. Vertical transmission of chemoautotrophic symbionts in the bivalve Solemya velum (Bivalvia: Protobranchia). Biol Bull. 1996;190: 195–202. 10.2307/1542539 [DOI] [PubMed] [Google Scholar]
  • 33.Ikuta T, Igawa K, Tame A, Kuroiwa T, Kuroiwa H, Aoki Y, et al. Surfing the vegetal pole in a small population: extracellular vertical transmission of an “intracellular” deep-sea clam symbiont. R Soc Open Sci. 2016;3: 160130 10.1098/rsos.160130 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Cary SC, Giovannoni SJ. Transovarial inheritance of endosymbiotic bacteria in clams inhabiting deep-sea hydrothermal vents and cold seeps. Proc Natl Acad Sci. 1993;90: 5695–5699. 10.1073/pnas.90.12.5695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Breusing C, Johnson SB, Vrijenhoek RC, Young CR. Host hybridization as a potential mechanism of lateral symbiont transfer in deep‐sea vesicomyid clams. Mol Ecol. 2019; mec.15224. 10.1111/mec.15224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dubilier N, Bergin C, Lott C. Symbiotic diversity in marine animals: the art of harnessing chemosynthesis. Nat Rev Microbiol. 2008;6: 725–740. 10.1038/nrmicro1992 [DOI] [PubMed] [Google Scholar]
  • 37.Duperron S, Halary S, Lorion J, Sibuet M, Gaill F. Unexpected co-occurrence of six bacterial symbionts in the gills of the cold seep mussel Idas sp. (Bivalvia: Mytilidae). Environ Microbiol. 2008;10: 433–445. 10.1111/j.1462-2920.2007.01465.x [DOI] [PubMed] [Google Scholar]
  • 38.Noellette Conway, Judith McDowell Capuzzo, Brian Fry. The Role of Endosymbiotic Bacteria in the Nutrition of Solemya velum: Evidence from a Stable Isotope Analysis of Endosymbionts and Host. Limnol Oceanogr. 1989;34: 249–255. [Google Scholar]
  • 39.Reid Robert G. B., Bernard Frank R. Gutless Bivalves. Sci New Ser. 1980;208: 609–610. [DOI] [PubMed] [Google Scholar]
  • 40.Decker C, Olu K, Arnaud-Haond S, Duperron S. Physical proximity may promote lateral acquisition of bacterial symbionts in vesicomyid clams. López-García P, editor. PLoS ONE. 2013;8: e64830 10.1371/journal.pone.0064830 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ponnudurai R, Kleiner M, Sayavedra L, Petersen JM, Moche M, Otto A, et al. Metabolic and physiological interdependencies in the Bathymodiolus azoricus symbiosis. ISME J. 2016. [cited 3 Jan 2017]. Available: http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2016124a.html [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ikuta T, Takaki Y, Nagai Y, Shimamura S, Tsuda M, Kawagucci S, et al. Heterogeneous composition of key metabolic gene clusters in a vent mussel symbiont population. ISME J. 2016;10: 990 10.1038/ismej.2015.176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Brandvain Y, Goodnight C, Wade MJ. Horizontal Transmission Rapidly Erodes Disequilibria Between Organelle and Symbiont Genomes. Genetics. 2011;189: 397–404. 10.1534/genetics.111.130906 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Stewart FJ, Cavanaugh CM. Bacterial endosymbioses in Solemya (Mollusca: Bivalvia)—Model systems for studies of symbiont–host adaptation. Antonie Van Leeuwenhoek. 2006;90: 343–360. 10.1007/s10482-006-9086-6 [DOI] [PubMed] [Google Scholar]
  • 45.Cocks LRM, Torsvik TH. Earth geography from 500 to 400 million years ago: a faunal and palaeomagnetic review. J Geol Soc. 2002;159: 631–644. 10.1144/0016-764901-118 [DOI] [Google Scholar]
  • 46.Biari Y, Klingelhoefer F, Sahabi M, Funck T, Benabdellouahed M, Schnabel M, et al. Opening of the central Atlantic Ocean: Implications for geometric rifting and asymmetric initial seafloor spreading after continental breakup: Opening of the Central Atlantic Ocean. Tectonics. 2017;36: 1129–1150. 10.1002/2017TC004596 [DOI] [Google Scholar]
  • 47.Shapiro BJ, Friedman J, Cordero OX, Preheim SP, Timberlake SC, Szabo G, et al. Population Genomics of Early Events in the Ecological Differentiation of Bacteria. Science. 2012;336: 48–51. 10.1126/science.1218198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Rosen MJ, Davison M, Bhaya D, Fisher DS. Fine-scale diversity and extensive recombination in a quasisexual bacterial population occupying a broad niche. Science. 2015;348: 1019–1023. 10.1126/science.aaa4456 [DOI] [PubMed] [Google Scholar]
  • 49.Chong RA, Park H, Moran NA. Genome Evolution of the Obligate Endosymbiont Buchnera aphidicola. Agashe D, editor. Mol Biol Evol. 2019;36: 1481–1489. 10.1093/molbev/msz082 [DOI] [PubMed] [Google Scholar]
  • 50.Ansorge R, Romano S, Sayavedra L, Kupczok A, Tegetmeyer HE, Dubilier N, et al. Diversity matters: Deep-sea mussels harbor multiple symbiont strains. bioRxiv. 2019. [cited 24 Jul 2019]. 10.1101/531459 [DOI] [PubMed] [Google Scholar]
  • 51.Ansari MA, Didelot X. Inference of the Properties of the Recombination Process from Whole Bacterial Genomes. Genetics. 2014;196: 253–265. 10.1534/genetics.113.157172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Vos M, Didelot X. A comparison of homologous recombination rates in bacteria and archaea. ISME J. 2009;3: 199–208. 10.1038/ismej.2008.93 [DOI] [PubMed] [Google Scholar]
  • 53.Rocha EPC. An Appraisal of the Potential for Illegitimate Recombination in Bacterial Genomes and Its Consequences: From Duplications to Genome Reduction. Genome Res. 2003;13: 1123–1132. 10.1101/gr.966203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Neher RA. Genetic Draft, Selective Interference, and Population Genetics of Rapid Adaptation. Annu Rev Ecol Evol Syst. 2013;44: 195–215. 10.1146/annurev-ecolsys-110512-135920 [DOI] [Google Scholar]
  • 55.Smith JM, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23: 23–25. [PubMed] [Google Scholar]
  • 56.Nilsson AI, Koskiniemi S, Eriksson S, Kugelberg E, Hinton JCD, Andersson DI. Bacterial genome size reduction by experimental evolution. Proc Natl Acad Sci U S A. 2005;102: 12112–12116. 10.1073/pnas.0503654102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Clayton AL, Jackson DG, Weiss RB, Dale C. Adaptation by Deletogenic Replication Slippage in a Nascent Symbiont. Mol Biol Evol. 2016;33: 1957–1966. 10.1093/molbev/msw071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Merrikh CN, Merrikh H. Gene inversion potentiates bacterial evolvability and virulence. Nat Commun. 2018;9: 4662 10.1038/s41467-018-07110-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Newton ILG, Bordenstein SR. Correlations Between Bacterial Ecology and Mobile DNA. Curr Microbiol. 2011;62: 198–208. 10.1007/s00284-010-9693-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Glémin S, Galtier N. Genome Evolution in Outcrossing Versus Selfing Versus Asexual Species In: Anisimova M, editor. Evolutionary Genomics. Totowa, NJ: Humana Press; 2012. pp. 311–335. 10.1007/978-1-61779-582-4_11 [DOI] [PubMed] [Google Scholar]
  • 61.Allen JM, Reed DL, Perotti MA, Braig HR. Evolutionary Relationships of “Candidatus Riesia spp.,” Endosymbiotic Enterobacteriaceae Living within Hematophagous Primate Lice. Appl Environ Microbiol. 2007;73: 1659–1664. 10.1128/AEM.01877-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Lefevre C. Endosymbiont Phylogenesis in the Dryophthoridae Weevils: Evidence for Bacterial Replacement. Mol Biol Evol. 2004;21: 965–973. 10.1093/molbev/msh063 [DOI] [PubMed] [Google Scholar]
  • 63.Degnan PH, Lazarus AB, Brock CD, Wernegreen JJ. Host–Symbiont Stability and Fast Evolutionary Rates in an Ant–Bacterium Association: Cospeciation of Camponotus Species and Their Endosymbionts, Candidatus Blochmannia. Johnson K, editor. Syst Biol. 2004;53: 95–110. 10.1080/10635150490264842 [DOI] [PubMed] [Google Scholar]
  • 64.Moran N, Wernegreen J. Lifestyle evolution in symbiotic bacteria: insights from genomics. Trends Ecol Evol. 2000;15: 321–326. 10.1016/s0169-5347(00)01902-9 [DOI] [PubMed] [Google Scholar]
  • 65.Takiya DM, Tran PL, Dietrich CH, Moran NA. Co-cladogenesis spanning three phyla: leafhoppers (Insecta: Hemiptera: Cicadellidae) and their dual bacterial symbionts. Mol Ecol. 2006;15: 4175–4191. 10.1111/j.1365-294X.2006.03071.x [DOI] [PubMed] [Google Scholar]
  • 66.Thao ML, Moran NA, Abbot P, Brennan EB, Burckhardt DH, Baumann P. Cospeciation of Psyllids and Their Primary Prokaryotic Endosymbionts. Appl Environ Microbiol. 2000;66: 2898–2905. 10.1128/aem.66.7.2898-2905.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Thao ML, Gullan PJ, Baumann P. Secondary (-Proteobacteria) Endosymbionts Infect the Primary (-Proteobacteria) Endosymbionts of Mealybugs Multiple Times and Coevolve with Their Hosts. Appl Environ Microbiol. 2002;68: 3190–3197. 10.1128/aem.68.7.3190-3197.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Santos-Garcia D, Vargas-Chavez C, Moya A, Latorre A, Silva FJ. Genome Evolution in the Primary Endosymbiont of Whiteflies Sheds Light on Their Divergence. Genome Biol Evol. 2015;7: 873–888. 10.1093/gbe/evv038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. Bioinformatics; 2019. January 10.1101/530972 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. Wang J, editor. PLoS ONE. 2014;9: e112963 10.1371/journal.pone.0112963 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013; arXiv:1303.3997.
  • 72.Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28: 1420–1428. 10.1093/bioinformatics/bts174 [DOI] [PubMed] [Google Scholar]
  • 73.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J Comput Biol. 2012;19: 455–477. 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Hyatt D, LoCascio PF, Hauser LJ, Uberbacher EC. Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics. 2012;28: 2223–2230. 10.1093/bioinformatics/bts429 [DOI] [PubMed] [Google Scholar]
  • 75.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10: 421 10.1186/1471-2105-10-421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Lagesen K, Hallin P, Rodland EA, Staerfeldt H-H, Rognes T, Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007;35: 3100–3108. 10.1093/nar/gkm160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955–964. 10.1093/nar/25.5.955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043–1055. 10.1101/gr.186072.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Wu M, Eisen JA. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 2008;9: 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Kelley DR, Schatz MC, Salzberg SL. Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 2010;11: 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, et al. MITOS: Improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol. 2013;69: 313–319. 10.1016/j.ympev.2012.08.023 [DOI] [PubMed] [Google Scholar]
  • 82.Liu H, Cai S, Zhang H, Vrijenhoek RC. Complete mitochondrial genome of hydrothermal vent clam Calyptogena magnifica. Mitochondrial DNA Part A. 2016;27: 4333–4335. 10.3109/19401736.2015.1089488 [DOI] [PubMed] [Google Scholar]
  • 83.Plazzi F, Ribani A, Passamonti M. The complete mitochondrial genome of Solemya velum (Mollusca: Bivalvia) and its relationships with Conchifera. BMC Genomics. 2013;14: 1 10.1186/1471-2164-14-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 2078–2079. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43: 491–498. 10.1038/ng.806 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27: 2156–2158. 10.1093/bioinformatics/btr330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Russell SL, Cavanaugh CM. Intrahost Genetic Diversity of Bacterial Symbionts Exhibits Evidence of Mixed Infections and Recombinant Haplotypes. Mol Biol Evol. 2017;34: 2747–2761. 10.1093/molbev/msx188 [DOI] [PubMed] [Google Scholar]
  • 88.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30: 1312–1313. 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975;7: 256–276. 10.1016/0040-5809(75)90020-9 [DOI] [PubMed] [Google Scholar]
  • 90.Nei M, Li W-H. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci. 1979;76: 5269–5273. 10.1073/pnas.76.10.5269 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5: 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Ponnudurai R, Sayavedra L, Kleiner M, Heiden SE, Thürmer A, Felbeck H, et al. Genome sequence of the sulfur-oxidizing Bathymodiolus thermophilus gill endosymbiont. Stand Genomic Sci. 2017;12 10.1186/s40793-017-0232-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Darling AE, Mau B, Perna NT. progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. Stajich JE, editor. PLoS ONE. 2010;5: e11147 10.1371/journal.pone.0011147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Guy L, Roat Kultima J, Andersson SGE. genoPlotR: comparative gene and genome visualization in R. Bioinformatics. 2010;26: 2334–2335. 10.1093/bioinformatics/btq413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Leplae R, Lima-Mendez G, Toussaint A. ACLAME: A CLAssification of Mobile genetic Elements, update 2010. Nucleic Acids Res. 2010;38: D57–D61. 10.1093/nar/gkp938 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Bi D, Xu Z, Harrison EM, Tai C, Wei Y, He X, et al. ICEberg: a web-based resource for integrative and conjugative elements found in Bacteria. Nucleic Acids Res. 2012;40: D621–D626. 10.1093/nar/gkr846 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Ranwez V, Harispe S, Delsuc F, Douzery EJP. MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons. Murphy WJ, editor. PLoS ONE. 2011;6: e22594 10.1371/journal.pone.0022594 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Yang Z. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol Biol Evol. 2007;24: 1586–1591. 10.1093/molbev/msm088 [DOI] [PubMed] [Google Scholar]
  • 99.Plazzi F, Puccio G, Passamonti M. Comparative Large-Scale Mitogenomics Evidences Clade-Specific Evolutionary Trends in Mitochondrial DNAs of Bivalvia. Genome Biol Evol. 2016;8: 2544–2564. 10.1093/gbe/evw187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28: 1647–1649. 10.1093/bioinformatics/bts199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Ankenbrand MJ, Keller A. bcgTree: automatized phylogenetic tree building from bacterial core genomes. Chain F, editor. Genome. 2016;59: 783–791. 10.1139/gen-2015-0175 [DOI] [PubMed] [Google Scholar]
  • 102.Wernersson R. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res. 2003;31: 3537–3539. 10.1093/nar/gkg609 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol. 2013;30: 772–780. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Hedge J, Wilson DJ. Bacterial Phylogenetic Reconstruction from Whole Genomes Is Robust to Recombination but Demographic Inference Is Not. Vidaver AK, editor. mBio. 2014;5: e02158–14. 10.1128/mBio.02158-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, Xie D, et al. BEAST 2: A Software Platform for Bayesian Evolutionary Analysis. Prlic A, editor. PLoS Comput Biol. 2014;10: e1003537 10.1371/journal.pcbi.1003537 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Behrensmeyer A, Turner A. Taxonomic occurrences of Bivalvia recorded in the Paleobiology Database. In: Fossilworks [Internet]. 2013. Available: http://fossilworks.org [Google Scholar]
  • 107.Pojeta J, Runnegar B, Kriz J. Fordilla troyensis Barrande: The Oldest Known Pelecypod. Science. 1973;180: 866–868. 10.1126/science.180.4088.866 [DOI] [PubMed] [Google Scholar]
  • 108.Brasier MD, Hewitt RA, Brasier CJ. On the Late Precambrian–Early Cambrian Hartshill Formation of Warwickshire. Geol Mag. 1978;115: 21–36. 10.1017/S0016756800040954 [DOI] [Google Scholar]
  • 109.Battistuzzi FU, Tao Q, Jones L, Tamura K, Kumar S. RelTime Relaxes the Strict Molecular Clock throughout the Phylogeny. Martin B, editor. Genome Biol Evol. 2018;10: 1631–1636. 10.1093/gbe/evy118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Tamura K, Tao Q, Kumar S. Theoretical Foundation of the RelTime Method for Estimating Divergence Times from Variable Evolutionary Rates. Russo C, editor. Mol Biol Evol. 2018;35: 1770–1782. 10.1093/molbev/msy044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Britton T, Anderson CL, Jacquet D, Lundqvist S, Bremer K. Estimating Divergence Times in Large Phylogenetic Trees. Anderson F, editor. Syst Biol. 2007;56: 741–752. 10.1080/10635150701613783 [DOI] [PubMed] [Google Scholar]
  • 112.Sheridan PP, Freeman KH, Brenchley JE. Estimated Minimal Divergence Times of the Major Bacterial and Archaeal Phyla. Geomicrobiol J. 2003;20: 1–14. 10.1080/01490450303891 [DOI] [Google Scholar]
  • 113.De Maio N, Wilson DJ. The Bacterial Sequential Markov Coalescent. Genetics. 2017;206: 333–343. 10.1534/genetics.116.198796 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Polimis K, Rokem A, Hazelton B. Confidence Intervals for Random Forests in Python. J Open Source Softw. 2017;2: 124 10.21105/joss.00124 [DOI] [Google Scholar]
  • 115.Didelot X, Lawson D, Darling A, Falush D. Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences. Genetics. 2010;186: 1435–1449. 10.1534/genetics.110.120121 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Kirsten Bomblies, Xavier Didelot

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

6 Jan 2020

Dear Dr Russell,

Thank you very much for submitting your Research Article entitled 'Horizontal transmission and recombination maintain forever young bacterial symbiont genomes' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by three independent peer reviewers with complementary expertise. The reviewers appreciated the attention to an important problem, but raised some substantial and thoughtful concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review again a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address all the specific points made by each of the three reviewers. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 90 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Xavier Didelot

Associate Editor

PLOS Genetics

Kirsten Bomblies

Section Editor: Evolution

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This manuscript by Russell et al is a very interesting investigation of the evolution of bacterial endosymbionts of molluscs. The analyses are varied and complex, providing an overview of three different types of endosymbiotic association, and how they affect the evolution of the bacteria involved. The work has the potential to be an important contribution to the literature on fundamental processes in bacterial evolution, but the methodological novelty of the study means there is substantial clarification and validation required before the manuscript is ready for publication.

The main points that need to be addressed are:

(1) The title makes reference to these bacteria maintaining “forever young” genomes. However, there is not much evidence for the long-term stability of the observed genotypes. S5 Table suggests there is not much diversity within each bacterial species (with very little diversity in the Calyptogena populations), and S7 Fig suggests much of this is likely to be due to be the result of a small number of long branches. Using the most relevant molecular clock data available, the authors should estimate (A) the time to the most recent common ancestor of the sequenced bacteria (eliminating the diversification through recombination as the coalscent analysis should allow, but it should not make much difference for Calyptogena), and (B) compare this to the estimated times at which the host species diverged. This should estimate the duration over which the symbiosis can be analysed. An alternative hypothesis to that proposed by the authors is that the ability for horizontal acquisition of these bacteria by hosts makes it possible for environmental bacteria to invade the niche and displace the resident symbionts (the “symbiont replacement” mentioned in the Introduction), explaining the observed data (including any most recent common ancestors that substantially post-date the divergence of hosts) without the genomes being “forever young”. Rather, this would support the model in which the genomes of the symbionts degrade, and are intermittently replaced by new species from the environment.

(2) The authors state “the other [non-vesicomyid] symbiont genomes are highly structurally dynamic”. However, Figure 4E demonstrates that these non-vesicomyid symbionts are polyphyletic – it is not surprising that there are many changes distinguishing the aligned pairs. The authors need to provide evidence that these representatives have had an endosymbiont lifestyle over the full period of their divergence (are the related species all endosymbionts?), or make clear they have become endosymbionts in parallel, and the genetic divergence has occurred outside of these hosts. The rest of the paragraph is confusing, not least owing to a lack of specificity (“It is”, “these groups”). The authors state that “This could proceed by iteratively locking deletions (initially caused by recombination… With recombination events occurring so rarely in these endosymbiont genomes, natural selection is unable to efficiently purge deleterious deletions. Inversions, which can be highly mutagenic by inverting the translated strand, inducing replication-transcription machinery collisions [33], may be nearly absent due to their high fitness costs.” – does recombination increase or decrease the rate at which deletions accumulate, and is selection weak (enabling deletions) or strong (preventing inversions)? Additionally, when stating “This is consistent with a combination of free-living and host associated periods”, they should specify whether this refers to the species current lifestyles (which would conflict with the description of these bacteria as obligate symbionts in the abstract, though this phrase is not repeated in the main paper), or over the history of the observed species.

(3) The authors refer to recombination repeatedly, but never specify whether they mean homologous recombination (cross over or gene conversion), horizontal gene transfer, or mobile elements. They should make clear which they mean in each analysis. In the S2 Supporting Text, they refer to “When a recombination event occurs on a genealogy, it creates an additional lineage to the right of the recombination breakpoint.” – this sounds like a crossover event, which is not a typical form of recombination in bacteria. The text does not specify the interpretation of rho, rho*l or r/m sufficiently. Most bacterial geneticists would interpret r/m as being the ratio of single nucleotide variants occurring through recombination to those occurring through point mutation (e.g. Feil et al 1999 Mol. Biol. Evol. 16(11):1496–1502), yet the r/m for C. fausta is 3625 (it is rarely over 10 for any bacterial species except the most rapidly recombining), despite it having a low rate of recombination and only 815 segregating sites. It seems likely the analysis is prone to false positive inferences of recombination – the calculated rates are high, and even bacteria lacking recA were found to undergo recombination. The authors should report the estimated rate of recombination for an analogous analysis of the mitochondrial DNA sequences to demonstrate the method is robust to false positives – perhaps additionally the method could be validated against datasets where r/m has been calculated by alternative methods?

Minor criticisms:

• S1 Table should be a spreadsheet with accession numbers for the included samples

• Figure 3B needs labelled scale bars – the trees should be presented more clearly, with leaf tip names clearly readable. Data from S7 Fig should be included as well.

• Figure 4E should be a separate, enlarged figure, with leaf tip names clearly readable

• The supplementary tables need more detailed legends, particularly S4 Table, S6-S8 Tables

• The authors need to explain how the p values on Pg. 9 Ln. 164-167 were calculated

• Why were genes with extreme dN/dS ratios excluded, rather than genes with small numbers of mutations? “Wilcox test” should be “Wilcoxon test”.

• The four gamete test should be more thoroughly explained.

• “Nanopore” and “Illumina” are sometimes not capitalized. There are contractions in the main and supplementary text (e.g. “that’s”).

Reviewer #2: Summary:

This study investigates the genome evolution of gammaproteobacterial symbionts from three groups of bivalves and provides an interesting symbiont study system, with the three host groups exhibiting different modes of symbiont transmission: horizontal, vertical and mixed mode transmission. Surprisingly, one symbiont group with reduced genomes has maintained a relatively stable genome size that is likely due to a moderate genetic recombination rate compared to others like insect symbionts with severely reduced genomes. This is a great study, but there are several aspects that need clarification and the authors have focused on the idea that the symbionts can maintain stable genome size indefinitely without much evidence or a discussion of other possible outcomes.

Major Comments:

Although using “forever young” in the title is catchy, this may not be true as stated in lines 186 – 187. At this moment in time, vesicomyid symbionts may be able to slow genome degradation and maintain an intermediate genome size, but the processes of Muller’s ratchet are still at work and continued genome reduction/degradation is a likely outcome. In which case, lines 206 – 207 also need to be rephrased, especially the term “indefinitely”.

I found it hard to follow the population genetics discussion without having more ecological context of the bivalve systems. Please include a short description of the transmission modes (how many symbionts are transferred during vertical transmission, how host switching occurs, how the hosts gain symbionts from the environment, are the symbionts obligate intracellular bacteria or facultative, etc.).

An “organelle-like state” is mentioned throughout the paper, but this term is highly debated. What is your definition of organelle-like?

Line 685 (Figure 1): It’s interesting that the vesicomyid symbiont and mitochondrial phylogenies are discordant since those symbionts are vertically transmitted. Can you explain more about why there are “sufficient amounts of horizontal transmission” occurring in vesicomyid symbionts? Also, please explain the difference between host-switching and horizontal transmission in vesicomyid symbionts. Can these symbionts be found in the environment (e.g. free-living) or do they become transferred directly from other non-related hosts?

Another aspect that I found interesting is that the vesicomyid symbionts have the smallest genomes of the bivalve symbionts but evolved from their free-living ancestors much later (66 mya) than the solemyid symbionts (400 mya). Therefore, vesicomyid symbiont genome reduction occurred relatively quickly compared to the solemyid symbionts. Also, the solemyid symbionts seem to have evolved from free living ancestors multiple times whereas the vesicomyid symbionts arose only once. Are these differences due strictly to transmission modes or are there other factors involved?

Also according to the symbiont phylogenetic tree, the solemyid symbionts evolved multiple times from free living ancestors and the mytilid symbionts evolved at least twice. Have these hosts replaced symbionts multiple times with free-living bacteria that eventually became new symbionts and thus, a range of symbiont genome sizes exist (at least in the solemyid symbionts)? This has happened in other symbiosis systems like mealybugs (Husnik et al. 2016), lice (Smith et al. 2013) and the protist, Euplotes (Boscaro et al. 2017) so it seems possible that these host groups also replaced the symbionts when deleterious mutations were fixed in symbiont populations, especially mutations in carbon fixation pathways.

Although a few examples of other symbiosis systems are included, a quick summary about the processes that drive genome reduction in other symbionts (e.g. insect symbionts) compared to those in the bivalve symbionts is needed. What are some differences and/or similarities between the systems (selection, drift, transmission, age of the symbiosis, etc.)? This would be a nice place to discuss why the vesicomyid symbionts have maintained a stable genome size whereas other systems have severely reduced symbionts. What are the processes involved and why would the vesicomyid symbionts have a different outcome?

Minor comments:

Please introduce the study system in the abstract since it is not mentioned until the last introduction paragraph.

(Line 33) Eukaryotes have not existed for the “history of life on Earth”

(Line 75 – 76) There are known marine symbiont genomes less than 1 Mb: bacterium AB1 in the marine invertebrate, Bugula neritina (Miller et al. 2016); Spiroplasma holothuricola in sea cucumbers (He et al. 2018); alphaproteobacteria in the marine protists, diplonemids (George et al. 2019)

(Figure 2) Switch Panel C and D labels so that the labels go in alphabetical order.

Reviewer #3: The authors ask an important and timely question about how the long-term stability of host-symbiont associations influences the symbiont genome evolution. They address the question by comparing some genomic properties of chemosynthetic gill-inhabiting bacterial symbionts in seven species representing three families of bivalves, with different modes of the symbiont transmission.

The authors have generated substantial amounts of novel quality data. I feel that the presented work has the potential of adding substantially to our understanding of symbioses of these bivalves, and symbioses more generally. However, I have several major concerns.

The presented data focus on seven bivalve species from three families with different symbiont transmission modes. Considering this, I feel that the author's claim of "testing the canonical theory" "by sampling marine endosymbionts that range from primarily vertical to strictly horizontal transmission", and that "These results validate long-standing but untested theory and demand a reinterpretation of the vast diversity of symbioses..." is an overstatement. I argue that major evolutionary theories relevant to a wide range of systems cannot be adequately tested using only three independent data points (=families). While the theoretical background provided in the Introduction is solid and relevant, I strongly recommend that the authors shift the focus of this study to what the study actually is: a comparative description of symbioses in three families of bivalves.

In an apparent attempt to make the paper appear relevant to a broad audience, the authors hid the information of what they actually studied. I found it striking that bivalves - the animal group that was studied - is not mentioned in the title, abstract, or author summary, not even once! Furthermore, the authors provided very little introductory background information for these bivalves or their symbionts [lines 100-102]. As the only reference to support the statement that the three bivalve families differ in their modes of symbiont transmission, they provided a meta-analysis of >500 host-microbe symbioses, and some of the results of the current study. More information about the nature and old age of these symbioses is provided at the end of the Discussion - not a place where readers typically look early on. Without that background information or experience with bivalve symbioses, I have found it difficult to understand what the authors have done and why, and what the results mean. Some of the key questions that the Introduction should address are:

* What is known about bivalve symbioses? What are their functions and localizations? How specific are they?

* Are the focal families obligately associated with symbionts?

* In their gills, do bivalves host single symbiont strains, or more complex multi-strain or multi-species microbiota?

* How these symbionts transmit - what did we know before the current study?

* What has been known about the functions of these symbionts?

In my opinion, the authors need to develop the second half of the Introduction around these questions, explaining how their presented work fills gaps in our existing knowledge.

Another major weakness of the study is the way information is organized in the article. I struggled to understand what exactly the authors did, how they did it, what was the reasoning, and which of the findings are new. I provided one example in the previous paragraph: the authors state that they have compared bivalve families with different symbiont transmission modes, citing a meta-analysis and some parts of the current study. Then, how much has been known about the transmission in the selected groups of bivalves? Differences in transmission modes ... is this an entirely new finding, or has there been prior data? After completing this review, I am still not sure.

I realize that the journal requires that the methodological details are only provided in the last section or/and in the supplement. Despite this, I would expect to find a summary of what the authors did at the end of the Introduction, or perhaps at the beginnings of paragraphs corresponding to Results. That was not the case here. It took me a substantial amount of effort (jumping between sections of the main text and the supplement) to understand that the authors have sequenced and analyzed gill metagenomes from multiple individuals from a total of eight populations of seven bivalve species. I further understood that for one individual per species, a reference symbiont + mitochondrial genome was assembled, facilitating the use of lower-coverage data for the symbiont diversity comparisons within and across hosts. Despite my background in invertebrate symbiont diversity and genomics, I failed to understand some of the analyses. A summary of the experimental setup would have greatly helped me follow the text. Please make it easy for readers to understand what you did!

I am puzzled by some of the findings of the study, and their interpretation. The authors claim in the Abstract that they compare symbiotic systems "that range from primarily vertical to strictly horizontal transmission". However, the host-symbiont phylogeny comparison (Fig. 1C) does not make it clear that the differences among host species are significant. At the same time, I am surprised that when sampling replicate individuals from a population, the authors discovered substantial mitogenomic diversity. That is not a pattern I have seen in insects; is it expected in bivalves? The authors seem to be using the smaller genome sizes of vesicomyid symbionts, lower recombination rates and other genomic features as a secondary confirmation of the deducted mode of transmission. Once again, was the mode of transmission known before the study? If not, then what is the likelihood that the observed patterns are due to biological differences among hosts, for example in the intra-population diversity?

I have not been familiar with folded allele frequency spectrum analyses, and I struggled to comprehend the plots in Figs 2-3 and S2-S4. Perhaps the authors could explain and discuss the analyses and patterns in a way that makes it easier for biologists not familiar with that particular method to understand the expectations and interpret patterns from these plots?

In the main text, the authors talk about vesicomyid, mytilid/bathymodiolin, and solemyid symbionts. In the figures, they provide Latin names of host species. I found it somewhat difficult to match and navigate the sets of names, and ended up searching multiple times which species belongs to which group. Perhaps the authors could think of ways of editing figures in a way that would help readers make the connection?

Finally, the authors use the genomic data for phylogenetic reconstructions. The addition of functional / contents analysis would have made the current story much stronger.

To sum up, I feel that the dataset holds strong promise for a publication. I strongly recommend that the authors present these data in the context of filling gaps in the understanding of bivalve symbioses. The systematic testing of whether the symbiont genomic patterns align with expectations for a given transmission mode would be a useful addition, but should not be the main focus. The authors need to explain in a clear and accessible way what was the state of the field, and which of the findings are new. They also need to state clearly what they did and why, how the findings complement existing knowledge, and what the results mean.

Specific comments:

Lines 210-211: The authors wrote: "These results validate long-standing but untested theory and demand a reinterpretation of the vast diversity of symbioses...". I am finding the claim that "the vast diversity of symbioses" should be reinterpreted based on data for three families that largely conform to the expectations quite arrogant.

Lines 231-233: Name kit/enzyme manufacturers

Lines 233-234: "to uniquely label both i5 and i7 indexes for each sample that was sequenced on a single lane of a Hiseq4000. We sequenced a total of four lanes of Hiseq4000"... Before I read into the supplement, I was quite confused. I initially understood that six spp. were sequenced, each filling its own lane. Change the wording.

Lines 297-298: The authors wrote "After producing endosymbiont genome assemblies and host mitochondrial genome assemblies for each host/endosymbiont population..." . This is misleading and confusing: the assemblies were provided for a single host individual per population.

Fig. 1C. What genes are the trees based on? Please provide scale bars. What are the outgroups?

Fig. 4E. The sample labels are provided, but the font size makes them virtually illegible.

Table S1. Why are the genome coverage values for nanopore reads provided as "na"?

Table S2. What is the advantage of providing the genome, scaffold etc. sizes in the format "2.37E+06" as opposed to the actual values?

In Table S3, the authors provide the list of "Symbiont species named in this study and named previously", including newly proposed names. The table is only referenced once in the main manuscript, but no information about that nomenclatural aspect is provided.

Also, I am not a Latin expert, but I believe that the proposed symbiont names are incorrect grammatically. The generic names should be nouns in the nominative, singular form, with endings corresponding to declension - but as far as I can tell, several of the generic names proposed here do not conform to these conventions. Please consult the bacterial nomenclature rules.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: It is not yet possible to tell if the data are all present in the databases, but a project accession number is provided by the authors, who indicate the data will be released on publication.

Reviewer #2: Yes

Reviewer #3: No: The authors state that the genomic data will be published at NCBI upon article acceptance

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Emma E. George

Reviewer #3: No

Decision Letter 1

Kirsten Bomblies, Xavier Didelot

15 May 2020

* Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. *

Dear Dr Russell,

Thank you very much for submitting your Research Article entitled 'Horizontal transmission and recombination maintain forever young bacterial symbiont genomes' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Xavier Didelot

Associate Editor

PLOS Genetics

Kirsten Bomblies

Section Editor: Evolution

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I thank Russell et al for their thoughtful and extensive response to my previous comments. The phylodynamic analyses in Figure 5 are excellent, and a valuable addition – to me, it would seem helpful to have this as Figure 1, as it allows a non-specialist reader to immediately understand how divergent the host and endosymbionts are, and the durations of their associations. The validation of the recombination analysis method is also nicely done, although there are a couple of aspects it would be useful to address in point 2 below.

The remaining problem is not concerning the data, but rather the interpretation of the link between recombination, selection and genome structure. Given the authors’ answers to my first review, there are three questions that it seems important to address:

(1) As the authors state all recombination they refer to is homologous recombination, is the rate high relative to other species? The authors state, “these populations have some of the largest effective recombination rates ever reported for bacteria” – this is unaltered from their first draft. Yet the authors state in their response that they “conflated” rho*l/theta with r/m (with the former giving higher estimates of the impact of recombination) – yet the cited Vos & Didelot study reports r/m, making a comparison of these different statistics very difficult. Given their novel method predicts recombination makes a bigger impact than mutation in the Calyptogena magnifica symbiont, despite it lacking the requisite homologous recombination machinery, can the authors validate this claim, and provide a robust comparison with other bacterial species, using existing software (e.g. ClonalFrameML, chromosome painting or Gubbins)? I am also confused about the statement in the response, “When a gene conversion event occurs on a genealogy, it creates an additional lineage within the converted segment to the right and to the left of the recombination breakpoints.” Why is an additional lineage created either side of a gene conversion event – both the upstream and downstream regions are part of the same clonal frame?

(2) Do the authors think recombination prevents, or causes, genome degradation? The Results section ends, “Thus, ample time has passed for these symbiont genomes to erode, but horizontal transmission and recombination have likely prevented it.” – an inference emphasised in the Conclusion – which suggests to the reader that recombination directly limits genome degradation. Yet in the Results section, the authors suggest a “recombinational processes may partially underlie genome erosion”, as “a rare recombination event induces a deletion that drifts to high frequency” – which is the process of genome degradation. As the authors say in their response, “Whenever we refer to recombination, we mean homologous recombination”, this suggests faster homologous recombination (as seen in some of these endosymbionts) should drive faster genome degradation. Can deletions not be spontaneous, rather than occurring through recombination? Might selection not be more important than recombination in the observed differences?

(3) To what extent is the study describing the evolution of symbionts? The addition of the tree is excellent, and highlights those comparisons between endosymbionts that are sister taxa (and can reasonably be assumed to have maintained an endosymbiotic lifestyle over the period of their divergence), and those that are distantly related, and therefore are likely to have evolved as free-living bacteria over some proportion of their divergence. The authors accurately refer to “Highly divergent solemyid symbionts” in the Results, but it would be more helpful to the reader to point out that it cannot be assumed the divergence occurred as symbionts (unless the many related taxa have all independently evolved to be free living), particularly given the pairwise comparison in Figure 4C is contrasted with that in 4D, where the compared isolates are a single clade, the most recent common ancestor of which was likely an endosymbiont.

I reiterate that I think this is an interesting and valuable study, but given its complexity, it is important to clarify and assess the central message.

Minor points:

In some places in the updated text, “mya” is just “my”.

Reviewer #2: The authors improved the clarity of the manuscript and added a nice analysis of the estimated divergence dates for both the hosts and symbionts. They also answered and implemented all of my questions/suggestions which I appreciated. However, I still believe the ‘forever young’ term in the title is stretching it a bit since this term only refers to the symbionts up to this point in time. It’s difficult to predict what will happen to the symbionts in the future, especially if recombination rates and horizontal transmission decrease. Therefore, they may not be ‘forever young’. I still think this paper is invaluable to the symbiosis field, and should be accepted since very few studies on symbiont population genetics exist from hosts other than insects.

Reviewer #4: This study by Russell et al. uses a population and comparative genomics approach to examine differences in the endosymbiont population dynamics among 3 families of marine bivalves with different modes of symbiont transmission. Each symbiosis is independently evolved with endosymbionts providing chemosynthetic carbon fixation. Overall I like the study and I think it represents a significant contribution to the field of marine bacterial endosymbioses. The data generated represents a valuable resource to the broader scientific community. Additionally, after reading previous reviews I feel like the authors have addressed the most significant issues in the manuscript. I have comments.

It isn’t clear to me (and this could be because I am less familiar with marine endosymbionts) why there seems to be an expectation that these symbiont genomes should undergo the same degree of erosion as insect endosymbionts? 1) The host is gutless, and relies entirely on the symbiont for ALL of its nutritional needs, not just a handful of essential amino acids. This necessarily implies the lower bound for genome size will be much higher than for an obligate terrestrial endosymbiont, where in drift will rule the drift/selection balance for a much higher fraction of the genome. 2) Line numbers. 3) Drift is much stronger in terrestrial endosymbionts that are never exposed to an outside environment. It seems marine symbionts, even in the case of mostly vertical transmission, are still more exposed to the environment as they are essentially processing the immediate environment for the host. Additionally, given rates of horizontal transfer, they clearly have to survive in some form outside the host. Whether, the transfer events are mediated through, burst, unfertilized eggs, the contents of which are then filtered by the gills of a neighbor or otherwise, this is far beyond the level of exposure transovarially, internally fertilized terrestrial arthropod symbionts will experience. I feel the more apt comparison is among ectosymbionts of terrestrial arthropods. 4) Line numbers. 5) Despite these points, as the previous reviewer pointed out, Mueller’s ratchet will ratchet more in the vertically transmitted symbionts, and the data support this (dN/dS, minor allele frequency spectrum). However, the maintenance of diverse coding content likely wouldn’t persist for 10s or 100s of millions of years if it weren’t needed, thus I see degradation to an organellar-like state as somewhat less likely than what occurs in other systems. Perhaps the most surprising aspect is that host-symbiont cospeciation and associations persist for as long as they do without replacements. 6) Line numbers. 7) It could likely be argued that the host takes over much of the workload as the symbiont degenerates, as seen in other systems. Delegation of these functions presumably stops at carbon fixation +other complex nutritional provisioning steps (like amino acid synthesis). The point of the prior reviewer about the remaining coding capacity is relevant in this context.

Other comments.

Figure 1C- Fst as well as other data being what it is among the symbionts associated with these bivalves, I have no doubt there is gene flow between hosts as this figure is meant to convey. However, with rampant polytomies in the trees, the degree of transfer might not be nearly as prevalent as this figure suggests (I see several ways of making it more concordant at a glance). I understand that you point this out in the text, and that this figure is more or less supposed to be a toy model to make a point about concordance, however, this figure doesn’t support the point that gene flow erodes host/symbiont concordance because it tells us little about the relationships that are supposedly “discordant” due to gene flow. Also, gene flow and recombination erode our ability to produce fully resolved phylogenies. These are trees made from populations, with a single consensus likely representing the major allele. With lots of gene flow and recombination it isn’t likely one would resolve these in any meaningful way at the population level but a polymorphism aware model (such as the one implemented in IQ-Tree2) run on allele frequency data would be a big improvement. The model allows for fixation of heterozygosity with Watterson’s theta or specifying the distribution from which to draw states. As for the mitogenome trees, I would use the ML tree without collapsing branches (maybe partition by codon position in the future), low support is expected with so few segregating sites and a single locus, but knowing the “best” hypothesis is still more informative than polytomies.

Figure 2C- The solemyids seem odd. I understand by looking at the supplementary that this is meant to represent the fact that they are intermediate. However, some individuals of S. pervernicosa almost appear to be in balancing selection? This appears to hold for the minor allele freq spectrum of individuals as well as Tajima’s D. I guess this would be the case of drift or an individual that had gene flow restricted for a number of generations and then recently had a transfer event. I guess my question is whether you see in nonsense mutations on one haplotype background that is compensated for in the alternative haplotype and vice versa? Since you have nanopore and read backed phasing data, this might be interesting to look at.

I have attached a version with line numbers that has additional comments..

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: The accession codes are not included in the manuscript - the authors state they will be submitted only after acceptance. Given the policies of PLOS journals, it would seem appropriate to request evidence of submission at this stage.

Reviewer #2: No: The authors state that the genomes will be submitted to NCBI directly before publication.

Reviewer #4: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #4: No

Attachment

Submitted filename: PGen.Russell_etal.pdf

Decision Letter 2

Kirsten Bomblies, Xavier Didelot

16 Jun 2020

Dear Dr Russell,

We are pleased to inform you that your manuscript entitled "Horizontal transmission and recombination maintain forever young bacterial symbiont genomes" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional accept, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about one way to make your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Xavier Didelot

Associate Editor

PLOS Genetics

Kirsten Bomblies

Section Editor: Evolution

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-19-01871R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Kirsten Bomblies, Xavier Didelot

21 Jul 2020

PGENETICS-D-19-01871R2

Horizontal transmission and recombination maintain forever young bacterial symbiont genomes

Dear Dr Russell,

We are pleased to inform you that your manuscript entitled "Horizontal transmission and recombination maintain forever young bacterial symbiont genomes" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Kaitlin Butler

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Overview of the genomic data production and analysis steps used to study the population genomic processes influencing endosymbiont genome evolution.

    (TIF)

    S2 Fig. Maximum likelihood phylogenies for host mitochondria (top) and symbionts (bottom).

    Groups of chemosynthetic associations are colored as in Fig 1: yellow = vesicomyids, green = solemyids, and blue = bathymodiolids. Mitochondrial and symbiont trees are rooted by gastropod and alphaproteobacterial outgroups, respectively. Scale bar = substitutions per site. Bootstrap support values indicated at nodes.

    (TIF)

    S3 Fig. Within-host symbiont folded allele frequency spectra for all B. septemdierum and B. childressi intrahost samples with more than 50x and 45x Illumina sequencing coverage, respectively (see S1 Table for coverages and S8 Table for diversity statistics).

    (TIF)

    S4 Fig. Within-host symbiont folded allele frequency spectra for all Solemya velum and Solemya pervernicosa intrahost samples with more than 50x Illumina sequencing coverage (see S1 Table for coverages and S8 Table for diversity statistics).

    (TIF)

    S5 Fig. Within-host folded allele frequency spectra for all Calyptogena fausta and Calyptogena magnifica intrahost samples with at least 50x Illumina sequencing coverage (see S1 Table for coverages and S8 Table for diversity statistics).

    (TIF)

    S6 Fig. Endosymbiont inheritance modes.

    Our generalized coalescent model of endosymbiont inheritance includes symbiont transmission modes ranging from strict horizontal transmission to strict vertical transmission, with mixed modes, exhibiting both horizontal and vertical strategies. The host populations (grey) undergo Wright-Fisher reproduction. Endosymbiont lineages (red and blue) either switch between host lineages or are inherited, depending on the transmission mode, until they coalesce in the same host lineage (purple).

    (TIF)

    S7 Fig. The observed number of pairwise differences across a range of parameters under the endosymbiont population model described above.

    Each distribution is 100 replicates with varying NH, H, and NS. The expectation following Equation 9 above is plotted as a red line and differs by less than 2 segregating sites from the observed mean for all cases investigated here.

    (TIF)

    S8 Fig. Local alignments suggest that few rearrangements have occurred between the S. velum and S. elarraichensis symbiont genomes.

    S. elarraichensis symbiont is the closest known relative of the S. velum symbiont, however material is exceedingly hard to obtain for this association, which occurs at a mud volcano at approximately 500–1000 m depth, and only a fragmented draft genome assembly was available. However, even these relatively short range segments reveal complete synteny (left). In comparison, over the same genomic distances, many rearrangements are evident between S. velum and S. pervernicosa (right), with the minority of segments retaining synteny.

    (TIF)

    S1 Table. Sample, sequencing library, and mapping coverage information.

    The second set of coverages listed for C. fausta apply to the libraries used for the intra-host analysis.

    (XLSX)

    S2 Table. De novo reference assemblies were assembled with Nanopore reads and polished with Illumina data.

    Illumina reads were used for individual sample genotype calling. There were no gaps (Ns) in any of the assemblies. The percent complete measure reflects how many of the 34 "essential genes" (see Materials and Methods) were found in the assembled genomes.

    (XLSX)

    S3 Table. Symbiont species named in this study and named previously.

    See S3 Supporting Text for full diagnoses and descriptions.

    (XLSX)

    S4 Table. Taxa and accession numbers used in the mitochondrial genome phylogenetic analysis and divergence dating.

    (XLSX)

    S5 Table. Taxa and accession numbers used in the bacterial whole genome phylogenetic analysis and divergence dating.

    (XLSX)

    S6 Table. Divergence date estimates from different Beast2, TimeTree, and PATHd8 runs with different parameter values.

    (XLSX)

    S7 Table. Between-host symbiont population statistics calculated from consensus symbiont and mitochondrial genome sequences.

    Random Forest (RF) theta and log10(rho*l) estimates were inferred by fitting genome-wide values of pi, Watterson’s Theta, and 4-gamete sites to values generated in coalescent simulations.

    (XLSX)

    S8 Table. Within-host symbiont and mitochondrial genetic diversity statistics.

    Mapping coverages in S1 Table.

    (XLSX)

    S9 Table. Out-of-bag (oob) scores for random-forest models for each parameter of interest, rho*l and theta, and for each sample size of endosymbiont individuals considered.

    Oob scores indicate how often the trained model is able to predict known values, with perfect prediction equal to one.

    (XLSX)

    S10 Table. Proportion of within-host variant sites that pass the 4-gamete test for recombination based upon read and read pair data over the given genomic intervals (constrained by Illumina library insert sizes).

    (XLSX)

    S11 Table. Comparative symbiont genome statistics and mobile element (ME) content.

    MEs were identified as regions within the symbiont genomes with high sequence identity to elements in insertion sequence, phage, and integrative conjugative element databases.

    (XLSX)

    S12 Table. Full list of ICEberg and ACLAME database mobile element hits with > = 90% sequence identity to endosymbiont genomic regions and genes.

    (XLSX)

    S1 Text. Supplemental text sections 1–3 for Forever young bacterial symbiont genomes.

    (PDF)

    Attachment

    Submitted filename: r2r_ForeverYoung_PLoSGenetics.pdf

    Attachment

    Submitted filename: PGen.Russell_etal.pdf

    Attachment

    Submitted filename: r2r-2-ForeverYoung_PLoSGenetics.pdf

    Data Availability Statement

    Data and genome assemblies generated in this study are available through NCBI BioProject number PRJNA562081 (BioSample numbers listed in S1 Table). Code written and used in our analyses is available from https://github.com/shelbirussell/ForeverYoungGenomes_Russell-et-al. Underlying numerical data for all graphs and summary statistics are available as Supporting Information.


    Articles from PLoS Genetics are provided here courtesy of PLOS

    RESOURCES