Abstract
Evidence for gene non-functionalization due to mutational processes is found in genomes in the form of pseudogenes. Pseudogenes are known to be rare in prokaryote chromosomes, with the exception of lineages that underwent an extreme genome reduction (e.g. obligatory symbionts). Much less is known about the frequency of pseudogenes in prokaryotic plasmids; those are genetic elements that can transfer between cells and may encode beneficial traits for their host. Non-functionalization of plasmid-encoded genes may alter the plasmid characteristics, e.g. mobility, or their effect on the host. Analyzing 10 832 prokaryotic genomes, we find that plasmid genomes are characterized by threefold-higher pseudogene density compared to chromosomes. The majority of plasmid pseudogenes correspond to deteriorated transposable elements. A detailed analysis of enterobacterial plasmids furthermore reveals frequent gene non-functionalization events associated with the loss of plasmid self-transmissibility. Reconstructing the evolution of closely related plasmids reveals that non-functionalization of the conjugation machinery led to the emergence of non-mobilizable plasmid types. Examples are virulence plasmids in Escherichia and Salmonella. Our study highlights non-functionalization of core plasmid mobility functions as one route for the evolution of domesticated plasmids. Pseudogenes in plasmids supply insights into past transitions in plasmid mobility that are akin to transitions in bacterial lifestyle.
Graphical Abstract
Graphical Abstract.
Introduction
Pseudogenes are gene copies that have been rendered non-functional due to mutations that disrupt their expression and/or function. Non-functionalization events that generate unitary pseudogenes, which lack a functional homolog in the genome are rather rare (e.g. (1)). In eukaryotes, pseudogenes frequently originate from non-functionalization of gene copies following gene duplication (2). In prokaryotes, protein family diversification is more often mediated by horizontal gene transfer (HGT) that introduces xenologs in the recipient genome, compared to gene duplication (i.e. paralogs) (3,4). Nonetheless, acquired genes may become non-functional upon the integration into the genome (5), e.g. due to incompatibility with the transcriptional or regulatory network of the recipient organism. Gene families of transposable elements (TEs) (e.g. transposases and site-specific recombinases) are characterized by a high frequency of pseudogenes, indicating that transposition and translocation play a major role in the evolution of pseudogenes (6). Pseudogenes thus correspond to molecular fossil record of past genes; the sequence similarity between pseudogenes and their functional homolog is expected to decay due to neutral mutational processes over time (7).
The frequency of pseudogenes in prokaryotic genomes is lower compared to eukaryotes, with several exceptions that include the genomes of Mycobacterium leprea, Rickettsia prowazekii (8), as well as Salmonella enterica serovars Typhi and Paratyphi A (9). It has been previously suggested that non-functional DNA in prokaryotes is rapidly eliminated by deletions due to the energetic cost of gene expression (10). The prevalence of pseudogenes in prokaryotes may be furthermore associated with bacterial lifestyles. Genomes of facultative pathogens are characterized by a high pseudogene frequency compared to free-living bacteria (11,12), which may correspond to an initial step in the evolution of genome reduction that is common in the adaptation to an obligate host-associated lifestyle (13).
Plasmids are extrachromosomal elements that play an important role in prokaryote evolution (14). The evolution of plasmids differs from that of bacterial chromosomes due to their relatively small genome size, variable copy number in the cell, and their ability to horizontally transfer between cells. Plasmids can be either self-transmissible (i.e. conjugative), mobilizable, or non-mobilizable, and this property can be inferred from their gene content. Plasmids having the complete set of genetic features required for conjugation, e.g. genes encoding the mating pair formation (MPF) machinery, are classified as self-transmissible; a plasmid is classified as mobilizable, if it encodes an oriT binding site or relaxase genes, but lacks the MPF machinery gene markers (e.g. (15)). Mobilizable plasmids are able to hijack the conjugation machinery of other plasmids for horizontal transfer (reviewed in (16)). Plasmids lacking an oriT and a relaxase gene are typically classified as non-mobilizable (e.g. (15)). Nonetheless, it has been suggested that plasmids classified as non-mobilizable with yet unidentified oriT might utilize relaxases of conjugative or mobilizable plasmids for transfer (17). Plasmid mobility is an evolvable property, for example, an evolution experiment of plasmid persistence in Escherichia coli under strong selection for the plasmid-encoded trait exemplified a loss of plasmid self-transmissibility due to a large deletion of genes required for conjugation (18). Indeed, a recent large-scale study of plasmid genomes suggested that conjugative plasmids frequently lose their self-transmissibility, with consequent changes in the plasmid genetic repertoire due to gene non-functionalization and loss (19). Hence, plasmids that harbor conjugative origins of transfer but lacking a relaxase and MPF machinery may evolved from ancestral conjugative plasmids (17).
Here, we study the extent of gene non-functionalization in plasmid genomes. For that purpose, we compare the prevalence of pseudogenes between plasmids and chromosomes using a large-scale analysis of 10832 publicly available genomes of 738 prokaryote genera. We further study the evolutionary history of plasmid-encoded pseudogenes in 2441 Enterobacteriaceae isolates including the genera Klebsiella, Escherichia and Salmonella. Comparing closely related plasmids (i.e. homologous plasmids), we further reconstruct the origin of pseudogenes in the context of transitions in plasmid mobility.
Materials and methods
Genomes data and gene families
Genomes of isolates comprising plasmids were downloaded from the NCBI RefSeq database (version 8/2022). Genome assemblies containing replicons labeled other than chromosomes or plasmids were excluded from the analysis. Only genomes with a single chromosome were retained. To avoid potential inclusion of experimentally miniaturized or genomes of low quality, we excluded genomes with chromosome size below the norm of free-living bacterial organisms (<550 kb). This threshold was determined according to the genome size of Mycoplasma genitalium (chromosome size ca. 580 kb) that we considered as reference for the minimal chromosome size of a free-living prokaryote species (20). The curated data consisted of 10832 genome assemblies from 738 prokaryotic genera (Supplementary Table S1). For a detailed analysis of pseudogenes in plasmids, we used the previously established KES dataset of enterobacterial genomes (21). The KES dataset includes 1114 chromosomes and 3098 plasmids from Escherichia; 755 chromosomes and 2693 plasmids from Klebsiella; 572 chromosomes and 993 plasmids from Salmonella (all from RefSeq version 01/2021; Supplementary Table S2). Protein coding genes in the KES genomes set are clustered into 32623 gene families as previously described in (21). Briefly, reciprocal best hits (RBHs) of protein sequences between all replicon pairs were identified using MMseqs2 (22) (v.13.45111, with module easy-rbh applying a threshold of E-value ≤ 1 × 10−10). RBHs were further compared by global alignment using parasail-python (23) (v. 1.2.4, with the Needleman-Wunsch algorithm). Sequence pairs with ≥ 30% identical amino acids were clustered into gene families using a high-performance parallel implementation of the Markov clustering algorithm (24) (HipMCL with parameter –abc -I 2.0).
Retrieval of pseudogenes
Pseudogenes were extracted from the RefSeq genomes according to their annotation and coordinates in the RefSeq genome feature tables. Pseudogenes in RefSeq genomes are annotated by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (25). Briefly, PGAP utilizes sequence similarity-based approaches and ab initio gene prediction to annotate genomes (26). Pseudogene annotation is part of the PGAP pipeline for the annotation of protein-coding genes. Genomic loci that have sequence similarity to known genes, yet are lacking evidence of other gene features (e.g. complete open reading frame (ORF)), are annotated as pseudogenes.
The quality of PGAP pseudogene annotation in KES dataset was further validated using Pseudofinder (27) (v.1.1.0, with module annotate and parameters -it 0.95 -s 0.8 -e 1e-9). The output of Pseudofinder was filtered to include only pseudogenes identified in intergenic regions. That is, pseudogenes annotated by Pseudofinder in the same locus as functional genes in the PGAP annotation were excluded. All of the PGAP annotated pseudogenes could be validated using Pseudofinder (see Supplementary text, Supplementary Figures S1 and S2). Pseudogenes identified solely by Pseudofinder were excluded.
All KES pseudogenes were furthermore tested for programmed frameshifts (as described, e.g. for bacterial insertion sequences (ISs) IS1, IS150 and IS911 (28)). Therefore, we searched for each pseudogene for a similar, large complete protein sequence in the KES dataset using MMseqs2 (22) (v.13.45111, with module search and parameters: –min-seq-id 0.95 -e 1.000E-09 -c 0.95 –cov-mode 2). The search result was filtered for the best alignment for each pseudogene using MMseqs2 (22) (v.13.45111, with module filterdb and parameter –extract-lines 1). The protein sequences were used as a query for sequence search against the contig where the corresponding pseudogene was found with BLAST (29) (v.2.12.0+, with module tblastn applying a threshold of E-value ≤ 1 × 10−9). This procedure yielded 135 (0.03%) pseudogenes that were identical to a complete protein sequence. The annotated function of those protein-coding genes corresponded mostly to hypothetical proteins (115 out of 135). Hence, we find no significant evidence for PGAP misclassification of pseudogenes due to programmed frameshifts of ISs.
Assignment of pseudogenes into gene families
Pseudogenes were assigned into the KES gene families using their nucleotide sequences that were extracted from genome assemblies according to the genomic coordinates annotated by PGAP. Pseudogenes were aligned with the amino-acid sequences of the KES gene families using MMseqs2 (22) (v.13.45111, with module search applying a threshold of E-value ≤ 1 × 10−10). These MMseqs2 parameters trigger an alignment of all six translated reading frames of the pseudogene nucleotide sequence to the target amino acids sequence. Pseudogenes were assigned into the gene family that had the highest number of significantly aligned gene family members to the pseudogene. The final KES dataset encompassed 32623 gene families consisting of 11995860 protein coding genes and 393350 (96% or the total) pseudogenes in 13375 gene families. The annotation of gene function was extracted from the RefSeq database and gene families were assigned the most frequent annotation among their member genes. The annotation of specific gene families discussed in detail in the results was further validated using sequence search against TnCentral (30) and Pfam (31) using the most frequent sequence variant of the gene families as query. During this analysis, members of a relaxase/nuclease domain-containing gene family were suspected to be ORFs erroneously classified as pseudogenes. The ORFs had a highly conserved synteny within mobilization related functions such as mobC, mbeB, and mbeD. In addition, homologous ORFs to mbeA (e.g. WP_077943844.1) were identified at the same genomic location supporting the classification of that locus as a gene rather than pseudogene. These misclassified pseudogenes in the annotations of ColE1-like plasmids were excluded from the analysis (n = 487, 0.12% pseudogene annotations discarded from the total set of pseudogenes).
Inference of dN and dS distances
As a first step, the most similar gene sequence to the pseudogene was identified. For that, pseudogene nucleotide sequences were translated into amino acid sequences using transeq from EMBOSS (32,33) (v.6.6.0.0, with parameters -frame 6 and -table 11). Translated pseudogenes were searched against the KES protein sequences using BLAST (29) (v.2.14.0+, with module blastp). The best hit was used for further comparison. A pairwise alignment between the translated pseudogene and the best matching amino acid sequence was inferred using Clustal Omega (34) (v.1.2.4, –outfmt = clu). Codon alignment was calculated using pal2nal (35) (v.14.1, with parameters -output paml and -codontable 11). The dN and dS distances were calculated using codeml from PAML (36) (v.4.9, with runmode = −2). Additionally, the maximum truncation size of pseudogenes compared to their homologous genes was determined by calculating the leading and trailing truncations between the pseudogene and the best matching homologous gene.
Inference of unitary and non-unitary pseudogenes
The closest homologous gene of pseudogenes in the KES dataset were inferred using MMseqs2 (22) (v.13.45111, with module search applying thresholds of E-value ≤ 1 × 10−10 and amino acid identity ≥ 30%). Thereby, four scenarios (categories) of the homologous gene locus were examined in the following hierarchical order: (i) on the same replicon, (ii) on the same replicon type, (iii) on a different replicon type and (iv) unitary pseudogene (see illustration of Figure 1B). In the example of plasmid pseudogenes, this analysis determined homologous protein-coding genes from the same plasmid, other plasmids of the same isolate, or the chromosome of the same isolate. If no homolog was found in the same bacterial isolate, the pseudogene was classified as unitary. The same procedure was applied for chromosomal pseudogenes identifying homologous genes on the same chromosome or plasmids within the same bacterial isolate.
Figure 1.
Evolution of gene non-functionalization in plasmids compared to chromosomes. (A) Distribution of pseudogene density (pseudogene per CDS) per plasmid and chromosome for the RefSeq and KES datasets. The graph includes plasmids and chromosomes with pseudogene content (RefSeq dataset: 21564 (74%) plasmids and 10832 (100%) chromosomes; KES dataset: 5573 (82%) plasmids and 2441 (100%) chromosomes). An increased pseudogene density in plasmids compared to chromosomes was observed also when analyzing pseudogenes detected with Pseudofinder (see Supplementary text, Supplementary Figure S1). (B) Proportions of pseudogenes according to the location of the closest related homologous CDSs in plasmids and chromosomes. The median sequence similarity between pseudogenes and their homologous genes ranged between 59–87% in the five location categories. Note that the evolution of these homologous pseudogenes and genes in multiple replicons within the same isolate may comprise diverse scenarios (see Supplementary Figure S6). (C) Non-functional to functional ratio per gene family in plasmids and chromosomes. Each dot corresponds to a gene family (green). Transposase gene families are marked (purple). (D) Ratio of pseudogene per CDS in plasmids (x-axis) to pseudogene per CDS in chromosomes (y-axis) for 6396 gene families that occur on both plasmids and chromosomes. Gene functions with a significant enrichment for pseudogenes are highlighted for plasmids (orange, n = 840) and chromosomes (purple, n = 221) (P < 0.001, using Fisher's exact test). Infinite (Inf) values correspond to gene families lacking pseudogenes (i.e. divisions by zero). (E) Example gene families that are enriched for pseudogenes. Names of listed gene families (C–E) have been validated with Pfam (31) and TnCentral (30) databases using sequence search with the most frequent sequence variant of the gene families.
Plasmid typing and inference of closely related (homologous) plasmids
Plasmid mobility type was predicted using MOB-suite (15) (v.3.0.3). The inference of closely related plasmids was based on a new scheme of plasmid taxonomic units (PTUs) (37) and shared gene content. Plasmids were assigned to plasmid taxonomic units (PTUs) using COPLA (38) supplying taxonomic information of their hosts (Supplementary Table S3). Plasmid taxonomic units correspond to a classification scheme of closely related plasmids as inferred from average nucleotide identity (ANI) networks of whole plasmid sequences (38). Additionally, plasmids were clustered into clusters of homologous plasmids based on shared gene content. Pairwise shared gene and pseudogene content of plasmids were calculated with the Jaccard similarity index from the gene and pseudogene families. The shared gene and pseudogene content were visualized with hierarchical clustering using pheatmap (v.1.0.12). Plasmid pairs sharing ≥60% of their genes were filtered to construct plasmid clusters using the mcl algorithm (39) (v.12–135, parameter –abc -I 2.0).
Phylogenetic analysis
Plasmid and host phylogenies were constructed from complete single copy gene families for the analyzed set of plasmids or chromosomes. Sequences were aligned using MAFFT (40) (v.7.520, with parameter –maxiterate 1000 –localpair). Host phylogeny in 133 universal chromosomal genes was reconstructed using IQ-TREE 2 (41) (v.2.2.2.7) with Le and Gascuel (LG) model based on amino acid sequences (with parameter -m LG -B 1000, (42)). The host phylogeny was rooted at midpoint using iTOL (43) (v.6.8). The reconstruction of plasmid phylogenetic trees was based on nucleotide sequences of complete single-copy genes (see Supplementary Tables S4 and S5) applying IQ-TREE 2 (41) (v.2.2.2.7) with a transition model (with parameter -m TIM -B 1000, (42)). The root position in the plasmid phylogenies was inferred by applying phylogenomic rooting using nucleotide sequences from plasmid gene families (44) (Supplementary Tables S4 and S5). The obtained phylogenetic trees were visualized and annotated using iTOL (43) (v.6.8).
Statistics
All statistical tests were performed using R (v.4.0.3).
Results
Higher density of pseudogenes in plasmids compared to chromosomes
To examine the presence of pseudogenes in plasmids and chromosomes we surveyed 10832 genome annotations from the Prokaryotic Genome Annotation Pipeline of NCBI’s RefSeq dataset (25). Pseudogenes were annotated in 21564 (74%) plasmids and 10832 (100%) chromosomes. Our results reveal that plasmids are characterized by a higher pseudogene density (pseudogene per CDS) compared to chromosomes (Figure 1A), with a median of 0.070 pseudogenes per coding sequence (CDS) for plasmids and 0.024 for chromosomes (P < 0.001, using Wilcoxon test). This equates to approximately 14 CDS per pseudogene for plasmids and 42 CDS per pseudogene for chromosomes. Thus, the genomic composition of plasmids is enriched by a 3-fold higher pseudogene density compared to chromosomes.
The size difference between plasmids and chromosomes raises the question of whether replicon size (i.e. plasmid or chromosome size) may explain the different pseudogene densities. To test the impact of replicon size differences, we constructed a dataset comprising chromosomes and plasmids with similar size distributions (Supplementary Figure S3). Comparing the pseudogene density between plasmids and chromosomes in that set showed that, also for plasmids and chromosomes of similar size, plasmids have a significantly higher pseudogene density compared to chromosomes (P < 0.001, using permutation test, medianplasmid= 0.06, medianchromosome= 0.03). Hence, the smaller plasmid genome size cannot alone explain the high pseudogene density in plasmids.
To test if the high pseudogene density in plasmids is a general phenomenon, we compared the pseudogene density between plasmids and chromosomes within different genera. Our results show that the pseudogene density was consistently higher in plasmids compared to chromosomes for all taxa, except Shigella (P < 0.001, using Wilcoxon test with FDR correction, Supplementary Figure S4). The high frequency of gene non-functionalization in Shigella has been previously associated with adaptation of a host-associated lifestyle in this genus (11,45). Hence, the elevated occurrence of pseudogenes in plasmids is common to most taxonomic groups.
To study plasmid gene non-functionalization in detail, we focused on plasmids residing in 2441 Klebsiella, Escherichia and Salmonella (KES) isolates. These three enterobacterial taxa have been extensively sampled for sequencing and plasmids in these taxa include diverse plasmid types (21,37). The density of pseudogenes in KES plasmids is significantly higher compared to chromosomes (P < 0.001, using Wilcoxon test, Figure 1A). This reflects an observed 4-fold increase in pseudogene density in plasmids compared to chromosomes and equates to approximately 9 CDS per pseudogene in plasmids and 40 CDS per pseudogene in chromosomes. Hence, enterobacterial genomes within the KES dataset capture the trend of increased pseudogene density in plasmids compared to chromosomes that we observed within the taxonomically diverse RefSeq dataset.
Pseudogenes having a functional homolog in the isolate genome, including the chromosome and extra-chromosomal elements, may have evolved from gene non-functionalization following a gene duplication (either within the same replicon, or via translocation of duplicates between replicons). Pseudogenes that do not have a functional homolog within the same isolate may have evolved either from non-functionalization of a single-copy gene, or alternatively, non-functionalization of a horizontally acquired gene upon arrival. The comparison of pseudogenes with their functional homologs showed that most pseudogenes are truncated (289746, 71%). Recent studies suggest that early stop codons in many truncated pseudogenes correspond to nonsense mutations (46), or transient alleles that may be reversed under purifying selection (47). Comparing the evolutionary rates of truncated and non-truncated pseudogenes we find that truncated pseudogenes are characterized by higher dN and dS, as well as higher dN/dS compared to non-truncated pseudogenes (Supplementary Figure S5), as is expected for DNA evolution following a relaxation of selection pressure (i.e. gene non-functionalization). Furthermore, the divergence of pseudogenes in plasmids and chromosomes compared to their closest homologous gene had similar characteristics (Supplementary Figure S5), hence the high density of pseudogenes in plasmids cannot be explained by a bias in the genome annotation pipeline.
Pseudogenes were further classified into two groups: unitary and non-unitary pseudogenes, based on whether a functional homolog to the pseudogene was present within the same isolate. Our results show that the proportion of pseudogenes that are accompanied by a homologous gene on the same replicon is lower for plasmid pseudogenes (Figure 1B). The proportion of unitary pseudogenes at the isolate level is comparable between plasmids (40%) and chromosomes (42%). Plasmid pseudogenes may furthermore have a homologous gene either in the chromosome (19%) or in another plasmid (10%) within the same isolate. Chromosomal pseudogenes having a plasmid homologous gene are rather rare (2%). Plasmid pseudogenes are thus more likely to have a homologous gene on a different replicon within the isolate genome compared to chromosomal pseudogenes. Accordingly, we conclude that gene non-functionalization in plasmids frequently entails the loss of genes that are redundant at the level of the whole isolate genome.
The majority of pseudogenes in plasmids correspond to deteriorated insertion sequences
Which genes are frequently non-functionalized? In plasmids, gene families comprising transposition-related gene functions such as IS1, Tn3 and IS3 are characterized by the highest frequency of pseudogenes (Figure 1C). Similarly, transposases such as those encoded by IS1, IS256 and IS3, are frequently non-functionalized in chromosomes (Figure 1C). Non-functional insertion sequences (ISs) accounted for a substantial portion of pseudogenes in our dataset with 88107 (22%) pseudogene instances. For example, the IS1 family transposase alone comprised approximately every 20th pseudogene (5.85%), with an average of 33 pseudogenes per gene. Furthermore, pseudogenes in the IS1 family transposases are the most abundant in the data and occur in 72% of the chromosomes (Supplementary Figure S7). Our results are thus in agreement with previous reports on frequent non-functionalization of TEs in prokaryote genomes (6) and furthermore show a similar trend in plasmid genomes.
The comparison between plasmids and chromosomes further reveals that the propensity for non-functionalization may be different depending on the replicon type. We identified 221 (0.64%) plasmid-encoded gene families that are significantly enriched for pseudogenes in the chromosome (Figure 1D, Supplementary Table S6); of these families, 84 (38.01%) include a homologous gene in the plasmid within the same isolate (i.e. as category 3 in Figure 1B). These families often correspond to plasmid genes, such as mobilization genes (mobA), toxin-antitoxin systems (relE/parE), and viral helicases (Figure 1E). Additionally, we identified 840 (2.4%) chromosomal gene families that are significantly enriched for pseudogenes in plasmids (Figure 1D, Supplementary Table S6); of these families, 498 (59.28%) have a chromosomal homologous gene in the same isolate. These families include genes such as LysR-type transcriptional regulators (lysR), LacI-related transcriptional regulators (lacI), and creatininase (crnA), that are specifically encoded on the chromosome (Figure 1E). We hypothesize that these gene families correspond to replicon-specific genes that underwent non-functionalization upon translocation to the other replicon, rendering them essentially dead-on-arrival. Barriers for gene transfer from chromosomes to plasmids have been previously shown, for example due to dose effect (termed also dose repetition) (48). The acquisition of an extra copy of a chromosomal protein-coding gene may lead to an increased dose of the product protein, which may have a negative effect on the host fitness. For example, the presence of an additional copy of the core chromosomal chaperonin genes groEL/groES on a plasmid in E. coli was shown to have a negative fitness effect on the plasmid host (48). Our results suggest the presence of similar barriers for gene transfer from plasmids to chromosomes.
Frequent pseudogenes in large mobilizable and non-mobilizable plasmids
Which plasmid types are most associated with a high frequency of pseudogenes? Initially we focus on two prominent plasmid characteristics: plasmid size and mobility class. The small plasmid types in the KES dataset mostly correspond to ColE1-like plasmids; accordingly, here we consider plasmids of <19 kb as small (21). Our analysis shows that most of the large plasmid types (4659, 97%) and about half of the small plasmid types (1045, 54%) contain pseudogenes. The pseudogene density is highest in small non-mobilizable plasmids (small plasmid types with pseudogenes: mediansmall mobilizable= 0.2, mediansmall non-mobilizable= 0.33) compared to large plasmid types and chromosomes, but that property is tightly linked to their small size encoding only few genes. Among the large plasmids, pseudogene density is highest in mobilizable plasmids followed by non-mobilizable plasmids (P < 0.05, using pairwise Wilcoxon test with FDR correction, medianlarge mobilizable= 0.27, medianlarge non-mobilizable= 0.15, medianlarge conjugative= 0.1, Figure 2A). Thus, approximately every fourth gene is non-functionalized in some mobilizable plasmids. Taken together, our results point towards an association between increased pseudogene densities in large plasmids and a lack of plasmid mobility.
Figure 2.
Pseudogene density and gene function in large (≥19 kb) plasmid types. (A) Ratio of pseudogene per CDS shown for large plasmid types (≥19 kb) and chromosomes (Replicons lacking pseudogenes were excluded. This includes 94 (3%) conjugative, 9 (1.7%) mobilizable and 61 (5.1%) non-mobilizable plasmids). (B) Enrichment for pseudogenes in gene families significantly depends on the plasmid mobility type in large plasmids (P < 0.05, using Fisher's exact test with FDR correction). The first mentioned plasmid mobility type indicates in which type the enrichment of pseudogenes for gene functions has been statistically observed. The second mentioned plasmid mobility type indicates the type from which the frequencies of pseudogenes and CDSs were used for the statistical comparison. (C) Distribution of pseudogene per CDS ratio for plasmid taxonomic units (PTUs) of large plasmid type (only frequent (nPTU≥ 30) PTUs are shown). Stacked bar plots show proportions of plasmid mobility type and the host taxonomy per PTU. Labels besides the boxplot report PTU-specific non-functionalized gene families corresponding to transfer-related (tra) functions (P < 0.05, using Fisher's exact test with FDR correction). PTUs highlighted in red are characterized by multiple plasmid mobility types (where the main plasmid mobility type < 95%).
Plasmids in the different plasmid size and mobility categories are typically different in their composition. We therefor compared the propensity of gene non-functionalization among the plasmid size and mobility groups. Only seven gene families were enriched for pseudogenes in the small plasmid types including antibiotic resistance genes (Supplementary Figure S8, Supplementary Table S7). A total of 488 gene families were enriched for pseudogenes in at least one of the large plasmid mobility classes (Figure 2B). The comparison between large plasmids in different mobility classes revealed a significant enrichment for pseudogenes in gene families encoding for plasmid transfer mechanisms including traI, nikB and traV (Figure 2B, Supplementary Table S8). Consequently, we hypothesized that the origin of these pseudogenes is gene non-functionalization following loss of self-transmissibility of conjugative plasmids.
To gain further understanding of gene non-functionalization events in the context of large plasmid types, we compared the pseudogene density among closely related plasmids as inferred from plasmid taxonomic units (PTUs) (37). PTUs that comprise a high pseudogene density typically include a high proportion of mobilizable and non-mobilizable plasmid types (Figure 2C). Gene deletions and changes in genetic repertoire of plasmids may lead to a transition in plasmid mobility, which may lead to PTU divergence and the evolution of novel PTUs in the long-term (17,19). Indeed, here we observe several PTUs where transfer-related gene families frequently occur as pseudogenes; these PTUs are highlighted as putative plasmid types where transitions in plasmid mobility may have occurred (red labeled PTUs in Figure 2C). Transfer-related pseudogenes thus correspond to molecular fossils of lost mobility mechanisms in plasmid genomes.
Vertical inheritance of pseudogenes in evolutionary-related plasmid clusters
Pseudogenes observed within the same gene family may have originated either from multiple independent gene non-functionalization events, or alternatively, from few non-functionalization events in ancestral plasmids followed by vertical inheritance during plasmid diversification. If vertical inheritance of pseudogenes is common in plasmid evolution, then the number of observed pseudogenes supplies an overestimation for the frequency of gene non-functionalization events. To evaluate the role of vertical inheritance in pseudogene evolution, we compared shared gene and pseudogene content among closely related plasmids. For that purpose, we clustered all plasmids into clusters of homologous plasmids based on shared gene content. We obtained 353 clusters comprising 85% large plasmids and 206 clusters comprising 89% of the small plasmids; these clusters largely correspond to the PTUs (see Supplementary Figure S9). A visual inspection of the shared genes matrices suggests that pseudogenes tend to be shared among plasmids that also share genes (i.e. clusters of closely related plasmid) (Figure 3AB). Indeed, the proportion of shared genes and shared pseudogenes between plasmid pairs are positively correlated (rs = 0.47, P < 0.001, using Spearman's correlation).
Figure 3.
Shared gene and pseudogene content are similar in enterobacterial plasmids. (A, B) Pairwise shared gene (red) and pseudogene (blue) matrices of large plasmid types (A) and small plasmid types (B) among the KES dataset. The matrices were sorted using hierarchical clustering of shared gene content. (C) Shared genes and pseudogenes of adjacent plasmid clusters (FSA, FSB, FE and E5) in the hierarchically clustered shared gene content matrix. Labeled plasmid clusters are based on the Markov Cluster Algorithm (MCL) using pairs of plasmids sharing ≥60% gene content (see methods).
The observation of shared pseudogenes in plasmids may be due to ancestral gene non-functionalization and vertical inheritance (i.e. retention) of non-functional DNA. An alternative explanation for shared pseudogene content is independent gene non-functionalization events in a narrow range of gene families. The strongest patterns of shared pseudogenes in our dataset are mostly restricted to closely related plasmid clusters. A total of 874 (10.1%) gene families comprising pseudogenes are plasmid cluster-specific (P < 0.05, using Fisher's exact test with FDR correction). Hence, shared pseudogene content between plasmids clusters, is expected to correspond to non-functionalization events in diverse gene families. A total of 356 (4.4%) gene families correspond to frequent, non-functional transposase-related functions such as IS1, Tn3 and IS26. Additionally, gene family size and the frequency of non cluster-specific pseudogenes are significantly correlated (rs = 0.58, P < 0.001, using Spearman's correlation). Shared pseudogene content between plasmid clusters thus corresponds either to horizontal gene transfer or commonly non-functionalized gene families.
Non-functionalization following segmental deletion is a signature for the loss of self-transmissibility
To gain a better understanding on the evolutionary events at the basis of pseudogene vertical inheritance, we reconstructed the evolution of closely related plasmid clusters. For that purpose, we selected two clusters of Salmonella plasmids that share a high proportion of their genes and correspond mostly to PTU-FS (Supplementary Table S3, Supplementary Figure S9). The shared gene content within the selected clusters FSA and FSB is 92% and 86%, respectively, with an average of 50% shared genes among plasmids in both clusters (Figure 3C). Plasmids of cluster FSA are homologous to plasmid pSLT (NC_003277.2) of S. enterica serovar Typhimurium and plasmids of cluster FSB are homologous to plasmid pSENV (NZ_CP063701.1) of S. enterica subsp. enterica serovar Enteritidis. Cluster FSA comprises only 83 conjugative plasmids and cluster FSB includes mostly (46, 92%) non-mobilizable plasmids, in addition to four conjugative plasmids. The average shared pseudogene content among plasmids within the clusters is 67% or 65% for clusters FSA and FSB, respectively. The shared pseudogene content among plasmids from different clusters was only 9%, on average, thus plasmids in the two clusters are diverged in their pseudogene content. The high proportion of shared gene (and pseudogene) content among plasmids in the two PTU-FS clusters supports the notion that these plasmids are of common ancestry, i.e. they are homologs, albeit with alterations in their mobility class.
To examine the evolution of PTU-FS plasmids, we reconstructed the plasmid phylogeny based on eleven single-copy gene families present in 122 (92%) plasmids. The phylogeny reveals a clear split between plasmid clusters FSA and FSB (Figure 4A). To reconstruct the ancestral mobility state of PTU-FS plasmids, we inferred the root of PTU-FS using a phylogenomic rooting approach (44). The phylogenetic inference revealed a root neighborhood in the deepest split between plasmid clusters FSA and FSB that includes branches leading to several conjugating plasmids of cluster FSA (Figure 4A, see also Supplementary Figure S10). This root position implies that the ancestor of PTU-FS plasmids was conjugative and the non-mobilizable plasmids in cluster FSB are derived plasmids (Figure 4A). Notably, the plasmid genome size differs significantly between the non-mobilizable cluster FSB plasmids and the conjugative FSA plasmids, with the non-mobilizable plasmids having a smaller genome size (P < 0.001, using Wilcoxon test, mediannon-mobilizable= 59372 bp, medianconjugative= 93865 bp). Further comparison of the gene content among the PTU-FS reveals a high conservation of plasmid gene order, except for the absence of transfer (tra) genes in the non-mobilizable FSB plasmids (Figure 5A, Supplementary Figure S11A).
Figure 4.
Phylogenetic relationship of self-transmissible and non-mobilizable plasmid clusters and their hosts. (A, B) Phylogenetic maximum-likelihood trees including 11 single-copy genes of plasmid clusters FSA and FSB (PTU-FS) and 10 single-copy genes of plasmid clusters FE (PTU-FE) and E5 (PTU-E5) (see Supplementary Tables S4 & S5). Three plasmids (ca. 2%) and 40 plasmids (ca. 11%) were excluded from the phylogenetic reconstruction of (A) and (B), respectively, to increase the number of complete single-copy gene families and hence the robustness of the phylogenetic inference. Arrows indicate the inferred root positions resulting from root neighborhood inference (see supporting gene trees in Supplementary Figure S10). The operational taxonomic unit symbols correspond to predicted plasmid mobility type (legend 1). The reconstructed tree branches are colored black (note the short branch length in nearly polytomic ancestral node). The plasmid clusters (PTUs) are shown by the colored lines extending from the tree branches (legend 2). (C) Host phylogenetic tree based on 133 chromosomal single-copy genes. The tree was rooted using midpoint rooting. The outer color strips of the phylogenetic host tree show the cluster affiliation of plasmids residing in the hosts (legend 2) and the host genera (legend 3).
Figure 5.
Pseudogene and gene family Presence Absence Pattern (PAP) of self-transmissible and non-mobilizable plasmid clusters. PAP of plasmid clusters FSA/FSB (A) and FE/E5 (B). The PAPs are sorted using hierarchical clustering of the shared gene family content. (C) Gene content and order in the segmental deletion inferred for the ancestor of FSB (non-mobilizable) plasmids (pointed by an arrow). The inference was based on the transfer-related gene neighborhood in cluster FSA (conjugative) plasmids (see also Supplementary Figure S12).
The loss of the conjugation machinery genes may have followed two alternative scenarios: gradual non-functionalization and gene loss, or a large-scale deletion followed by non-functionalization of functionally-related genes. Since the absent genes are consecutively arranged in the ancestral plasmid (Figure 5C), we infer that a large segmental deletion of transfer-related genes occurred in the ancestor of cluster FSB non-mobilizable plasmids (Supplementary Figure S12). The transfer-related genes traV and FinO (a repressor of conjugative transfer in F-like plasmids) correspond to the boundaries of the deleted transfer-related gene arrangement in the ancestor and are found as pseudogenes in derived plasmids. Notably, the Salmonella host phylogeny shows that plasmid clusters FSA and FSB reside in distinct groups of Salmonella isolates where hosts of the conjugative FSA plasmids appear as more distantly related in comparison to hosts of the non-mobilizable FSB plasmids (Figure 4C, Supplementary Figure S13). Taken together, our results reveal a transition in plasmid mobility from self-transmissible to non-mobilizable plasmids within Salmonella PTU-FS plasmids due to large segmental deletion followed by gene non-functionalization events.
To examine further scenarios for gene non-functionalization in plasmid genomes, we investigated two additional adjacent plasmid clusters in the shared gene matrix where the pattern of gene and pseudogene sharing indicates that they are closely related (Figure 3C). The plasmids in these clusters correspond to the evolutionary related PTUs FE and E5 (19). Most (167, 87%) of the plasmids in cluster FE were assigned to the conjugative PTU-FE, with the remaining plasmids being unassigned (Supplementary Table S3, Supplementary Figure S9). All 164 plasmids in cluster E5 were classified into the non-mobilizable PTU-E5 (Supplementary Table S3, Supplementary Figure S9), which correspond to homologs of pO157 in E. coli strain O157:H7 and are known to be non-transmissible (17,49). Plasmids in cluster FE share, on average, 53% of their genes and 12% of their pseudogenes. In comparison, gene sharing among plasmids in cluster E5 shows a higher shared content with 90% of the genes and 80% of the pseudogenes shared, on average. Plasmids in FE and E5 clusters share, on average, 18% gene content and 5% pseudogene content. The shared gene content among plasmids in the two clusters reflects evolutionary relationships where the different plasmid mobility types may be explained by the differential gene content.
To examine the evolutionary history of plasmids in clusters FE and E5, we reconstructed the plasmid backbone phylogeny using ten single-copy gene families present in 316 (89%) plasmids. The phylogeny revealed a clear split between the plasmid clusters (Figure 4B). The topology of cluster E5 plasmids reveal two main plasmid lineages of closely related plasmids. In contrast, the phyletic pattern of cluster FE conjugative plasmids revealed a high divergence among the plasmids (Figure 4B). The inferred root position is located in a branch leading to two conjugative cluster FE plasmids (Supplementary Figure S10). Notably, two plasmids in cluster E5 encode a partial repertoire of the conjugation machinery and they were accordingly predicted as conjugative; these plasmids branch closely to the split with cluster FE plasmids (Supplementary Figure S14) and their syntenic gene order reveals differential loss of transfer-related genes (Supplementary Figure S14). Comparing the sequences of the conjugative plasmids of cluster E5 to their closest non-mobilizable neighbor plasmid revealed evidence for loss of plasmid mobility genes and shared genomic arrangements in the non-mobilizable cluster E5 plasmids (Figure 5B, Supplementary Figure S12B). The phylogenetic reconstruction suggests that the ancestral plasmid of cluster E5 plasmids (or PTU-E5) was conjugative and the plasmid mobility was lost due to a segmental deletion of the tra genes (Figure 4B, Supplementary Figure S14). Plasmids in cluster E5 share 80% of their pseudogenes, including transposases such as IS3 (n = 508), IS1 (n = 336), and IS91 (n = 320), as well as transfer-related gene families traI (n = 164) and nikB (n = 164) that are non-functionalized in 164 (98%) of the cluster E5 plasmids. Plasmids in cluster FE reside in a diverse set of hosts including Klebsiella, Escherichia and Salmonella. In contrary, plasmids in cluster E5 are only hosted by closely related Escherichia isolates including strain O157:H7 (Figure 4C); these plasmids are homologs of plasmid pO157 that is non-mobilizable (e.g. NZ_CP017435.1). Note that the hosts of cluster E5 plasmids are diverged, i.e. they do not correspond to isogenic strains (Supplementary Figure S13). Hence the topology of cluster E5 phylogeny likely corresponds to within-host evolution via vertical inheritance. Notably, cluster E5 plasmids are characterized by significantly larger plasmid genome size, but a smaller number of CDSs compared to the conjugative cluster FE plasmids, which is well explained by the frequent pseudogenes in cluster E5 plasmids (P < 0.001, using Wilcoxon test, medianCluster E5= 78 CDS, 19 pseudogenes, 93179 bp; medianCluster FE = 86 CDS, 9 pseudogenes, 74922.5 bp). Hence pseudogenes may be retained as non-coding DNA in the evolution of the non-mobilizable plasmids after the loss of self-transmissibility. Our results thus reveal that plasmid divergence events may give rise to new integral replicons within their host lineage.
Discussion
Gene non-functionalization is considered a common event following gene duplication and gene acquisition via lateral gene transfer. Previous studies suggested that bacterial genomes are often devoid of pseudogenes due to purifying selection (8). This implies that pseudogenes in bacterial genomes should be considered as ‘garbage DNA’, that is, non-functional DNA that may have an effect on the organism fitness, rather than ‘Junk DNA’ (see (50) for definitions). Our results show that pseudogenes are found in bacterial genomes and their density is highest in plasmids. We identify two main processes in plasmid evolution that are accompanied by frequent gene non-functionalization: proliferation of mobile genetic elements (MGEs) and loss of plasmid mobility (i.e. reductive plasmid evolution).
Mobile genetic elements, including transposons and integrons, are known to translocate or proliferate within and between genomes (51), thus, facilitating gene translocation and transfer, as well as genomic rearrangements (reviewed in (52)). Specifically in conjugative plasmids, ISs are known to mediate and facilitate the transfer of antibiotic resistance genes (53). At the same time, the integration of MGEs in bacterial genomes may lead to gene non-functionalization due to disruption of open reading frames at the insertion locus. Examples are previous reports on IS-mediated inactivation of the arginine biosynthesis pathway in the tsetse fly symbiont Sodalis glossinidius (54) and an inactivation of the plasmid-encoded nitrogen fixation genes (nif) in Bradyrhizobium isolates (55). The high frequency of MGE pseudogenes in plasmids reported here conforms previous studies highlighting the role of failed transposition events in the generation of pseudogenes in bacterial genomes (e.g. (6)). We observed that most pseudogenes are due to gene truncations and that pseudogenes in plasmids are more deteriorated. Indeed, site-specific recombination of transposable elements generates genomic hotspots for their insertion, where repeated insertions at the same locus may lead to gene truncations (reviewed in (56)). That being said, recent studies suggest that many pseudogenes are transcribed, and sometime even translated (57). Pseudogenes of MGE origin may thus correspond to proto-genes and a source for genetic innovation during MGE diversification (47). This suggestion is in line with recent studies that report pseudogene resurrection under strong selective conditions for the lost gene function. Examples include the resurrection of iron uptake in E. coli (58) and the resurrection of CO2 assimilation in Brucella spp. (59).
Our analysis furthermore reveals that the propensity for specific gene non-functionalization depends on the replicon type, i.e. plasmid or chromosome. This observation may be explained by selection against the presence of essential (chromosomal) genes in plasmids, e.g. due to unreliable plasmid inheritance (60) or deleterious effects of gene duplication (48). Thus, the higher pseudogene density in plasmids compared to chromosomes may be explained by the propensity of plasmid gene content to include accessory and dispensable genes. We note however, that the classification of genes into core and accessory may vary depending on the bacterial taxon, and that has an effect on the evolution of plasmid gene content in different bacterial taxa (e.g. in the context of antibiotic resistance plasmids; (61)). We hypothesize that the horizontal transfer (or translocation) of chromosomal genes into plasmids (and vice versa) is bound to yield effectively ‘dead-on-arrival’ pseudogenes.
Inferring rooted plasmid phylogenies that comprise related non-mobilizable and conjugative plasmids enabled us to infer the ancestral and derived plasmid variants in plasmid evolution. According to our inference, non-functionalized copies of transfer-related genes in large non-mobilizable and mobilizable plasmids are molecular fossils underlining a self-transmissible plasmid origin. The loss of plasmid self-transmissibility may lead to plasmid ‘domestication’ in the host lineage (or plasmid ‘fixation’ in the host pangenome). Domesticated plasmids are characterized by a stable vertical inheritance within the lineage that can be considered as ancestral states of chromids (62). Examples for the evolution of chromids from essential plasmids are found in plant-associated bacteria such as Sinorhizobium meliloti harboring a chromid, pSYMB, that encodes essential genes (63). Nonetheless, evolution of domesticated plasmids following the acquisition of essential genes was so far not considered in Enterobacteriaceae (48,60). Previous studies reported a stable inheritance of plasmids pO157 in Escherichia and pSLT in Salmonella (64–66). Although, pSLT is self-transmissible, it was shown that most Salmonella virulence plasmids are rather vertically inherited except for pSENV of S. enterica subsp. enterica serovar Enteritidis (67). Plasmids of S. enteritidis bear incomplete tra regions (68) and virulence plasmids of serovars Enteritidis and Typhimurium are not mobilizable by other F-like plasmids even in the presence of an oriT region (69). The results of our analysis support the view that pSENV plasmid homologs were acquired by horizontal transfer subsequent to a segmental deletion of transfer-related gene (the evolution of serovar specific virulence plasmids reviewed in (70)). Furthermore, the phylogeny of isolates hosting pO157- and pSENV-like plasmids indicates that these plasmids were fixed in those lineages following the loss of self-transmissibility (Figure 4C), hence they are better considered as domesticated plasmids (see also (71)).
The pseudogene repertoire and order in large non-mobilizable plasmids indicate that segmental deletions within the conjugation machinery play a major role in the initial steps of transitions in plasmid mobility. Such events may be caused, for example, by a transposon-mediated segmental deletion. A similar event was observed in an experimental evolution study of plasmid pKP33 adaptation to a naïve E. coli host under selection for the plasmid-encoded antibiotic resistance (18). Such a change in the plasmid genomic composition that leads to the loss of self-transmissibility is likely to trigger a ‘domino effect’ of sequential non-functionalization events of other transfer-related genes. Similar dynamics of gene non-functionalization have been previously suggested for the evolution of pseudogene content in obligatory symbionts (72). The loss of self-transmissibility may be deleterious for plasmids due to the inevitable restriction in the host range. Nonetheless, plasmids encoding a trait that is essential for the host will be maintained in the population, also if their inheritance is unstable due to selection acting at the level of host (e.g. see (73)). Another commonality in the evolution of plasmids and facultative pathogens is the interplay of MGE proliferation and pseudogene accumulation that is associated with genome miniaturization. Examples for pathogens having highly reduced (functional) genome size following massive gene decay are Mycobacterium leprae, Yersinia pestis, Shigella flexneri, Rickettsia prowazekii, Salmonella enterica serovars Paratyphi A, and Typhi (11,74–78). Genome miniaturization following gene non-functionalization and loss has been furthermore described in the evolution of obligatory symbionts in heritable symbioses where deleterious mutations can be fixed by genetic drift (reviewed in (79)). Similarly to plasmids, frequent gene non-functionalization in the evolution of symbionts can lead to host range restriction that is accompanied by a shift from facultative to obligatory symbiosis. Co-adaptation plasmids that lost their mobility and their host can lead to the evolution of stable plasmid inheritance (e.g. as observed in experimental plasmid evolution studies (18,80,81)) and eventually plasmid domestication. Reductive genome evolution of plasmids is thus akin to the evolution of obligatory symbionts and the mechanisms involved in these processes – transposable element proliferation and gradual gene non-functionalization leading to shifts in the plasmid lifestyle – are likely similar.
Supplementary Material
Acknowledgements
We thank Nils Hülter, Devani Picazo, Fabian Nies, Lisa Hartmann, Johannes Effe, Ishan Bhatt, and Mario Santer for critical comments on the manuscript. We thank Fenna Stücker for graphical abstract illustration. This research was supported in part through high-performance computing resources available at the Kiel University Computing Centre.
Author contributions: D.M.H. and T.D. conceived the study. D.M.H. designed and performed the data analysis and visualizations. Y.W. prepared and provided complementary data. D.M.H. and T.D. interpreted the results and wrote the manuscript with comments and additions from Y.W.
Contributor Information
Dustin M Hanke, Institute of General Microbiology, Kiel University, Kiel, Germany.
Yiqing Wang, Institute of General Microbiology, Kiel University, Kiel, Germany.
Tal Dagan, Institute of General Microbiology, Kiel University, Kiel, Germany.
Data availability
The data underlying this article are available in the article and in its online supplementary material.
Supplementary data
Supplementary Data are available at NAR Online.
Funding
German Science Foundation [RTG 2501 TransEvo, grant number: 456882089]; Leibniz Science Campus [EvoLUNG]; European Research Council [pMolEvol, grant number: 101043835]; China Scholarship Council (CSC scholarship to Y.W.). Funding for open access charge: Leibniz Science Campus [EvoLUNG]
Conflict of interest statement. None declared.
References
- 1. Zhao H., Yang J.-R., Xu H., Zhang J.. Pseudogenization of the umami taste receptor gene Tas1r1 in the giant panda coincided with its dietary switch to bamboo. Mol. Biol. Evol. 2010; 27:2669–2673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Ohno S. An argument for the genetic simplicity of man and other mammals. J. Hum. Evol. 1972; 1:651–662. [Google Scholar]
- 3. Treangen T.J., Rocha E.P.C.. Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genet. 2011; 7:e1001284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Tria F.D.K., Martin W.F.. Gene duplications are at least 50 times less frequent than gene transfers in prokaryotic genomes. Genome Biol. Evol. 2021; 13:evab224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. van Passel M.W.J., Marri P.R., Ochman H.. The emergence and fate of horizontally acquired genes in Escherichia coli. PLoS Comput. Biol. 2008; 4:e1000059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Liu Y., Harrison P.M., Kunin V., Gerstein M.. Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes. Genome Biol. 2004; 5:R64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Li W.-H., Gojobori T., Nei M.. Pseudogenes as a paradigm of neutral evolution. Nature. 1981; 292:237–239. [DOI] [PubMed] [Google Scholar]
- 8. Lawrence J.G., Hendrix R.W., Casjens S.. Where are the pseudogenes in bacterial genomes?. Trends Microbiol. 2001; 9:535–540. [DOI] [PubMed] [Google Scholar]
- 9. Holt K.E., Thomson N.R., Wain J., Langridge G.C., Hasan R., Bhutta Z.A., Quail M.A., Norbertczak H., Walker D., Simmonds M.et al.. Pseudogene accumulation in the evolutionary histories of Salmonella enterica serovars Paratyphi A and Typhi. BMC Genom. 2009; 10:36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Kuo C.-H., Ochman H.. The extinction dynamics of bacterial pseudogenes. PLoS Genet. 2010; 6:e1001050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Jin Q. Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res. 2002; 30:4432–4441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Moran N.A., Plague G.R.. Genomic changes following host restriction in bacteria. Curr. Opin. Genet. Dev. 2004; 14:627–633. [DOI] [PubMed] [Google Scholar]
- 13. Ochman H., Davalos L.M.. The nature and dynamics of bacterial genomes. Science. 2006; 311:1730–1733. [DOI] [PubMed] [Google Scholar]
- 14. Rodríguez-Beltrán J., DelaFuente J., León-Sampedro R., MacLean R.C., San Millán Á.. Beyond horizontal gene transfer: the role of plasmids in bacterial evolution. Nat. Rev. Microbiol. 2021; 19:347–359. [DOI] [PubMed] [Google Scholar]
- 15. Robertson J., Nash J.H.E.. MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microb. Genom. 2018; 4:e000206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Ramsay J.P., Firth N.. Diverse mobilization strategies facilitate transfer of non-conjugative mobile genetic elements. Curr. Opin. Microbiol. 2017; 38:1–9. [DOI] [PubMed] [Google Scholar]
- 17. Ares-Arroyo M., Coluzzi C., Rocha E.P.C.. Origins of transfer establish networks of functional dependencies for plasmid transfer by conjugation. Nucleic Acids Res. 2023; 51:3001–3016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Porse A., Schønning K., Munck C., Sommer M.O.A.. Survival and evolution of a large multidrug resistance plasmid in new clinical bacterial hosts. Mol. Biol. Evol. 2016; 33:2860–2873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Coluzzi C., Garcillán-Barcia M.P., de la Cruz F., Rocha E.P.C.. Evolution of plasmid mobility: origin and fate of conjugative and nonconjugative plasmids. Mol. Biol. Evol. 2022; 39:msac115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Glass J.I., Assad-Garcia N., Alperovich N., Yooseph S., Lewis M.R., Maruf M., Hutchison C.A., Smith H.O., Venter J.C.. Essential genes of a minimal bacterium. Proc. Natl. Acad. Sci. U.S.A. 2006; 103:425–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Wang Y., Dagan T.. The evolution of antibiotic resistance islands occurs within the framework of plasmid lineages. 2024; bioRxiv doi:21 February 2024, preprint: not peer reviewed 10.1101/2024.02.20.581145. [DOI] [PMC free article] [PubMed]
- 22. Steinegger M., Söding J.. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017; 35:1026–1028. [DOI] [PubMed] [Google Scholar]
- 23. Daily J. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. 2016; 17:81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Azad A., Pavlopoulos G.A., Ouzounis C.A., Kyrpides N.C., Buluç A.. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic Acids Res. 2018; 46:e33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Li W., O’Neill K.R., Haft D.H., DiCuccio M., Chetvernin V., Badretdin A., Coulouris G., Chitsaz F., Derbyshire M.K., Durkin A.S.et al.. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Res. 2021; 49:D1020–D1028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Besemer J., Lomsadze A., Borodovsky M.. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 2001; 29:2607–2618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Syberg-Olsen M.J., Garber A.I., Keeling P.J., McCutcheon J.P., Husnik F.. Pseudofinder: detection of pseudogenes in prokaryotic genomes. Mol. Biol. Evol. 2022; 39:msac153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Chandler M., Fayet O.. Translational frameshifting in the control of transposition in bacteria. Mol. Biol. 1993; 7:497–503. [DOI] [PubMed] [Google Scholar]
- 29. Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L.. BLAST+: architecture and applications. BMC Bioinform. 2009; 10:421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Ross K., Varani A.M., Snesrud E., Huang H., Alvarenga D.O., Zhang J., Wu C., McGann P., Chandler M.. TnCentral: a prokaryotic transposable element database and web portal for transposon analysis. mBio. 2021; 12:e02060-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G.A., Sonnhammer E.L.L., Tosatto S.C.E., Paladin L., Raj S., Richardson L.J.et al.. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021; 49:D412–D419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Rice P., Longden I., Bleasby A.. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000; 16:276–277. [DOI] [PubMed] [Google Scholar]
- 33. Goujon M., McWilliam H., Li W., Valentin F., Squizzato S., Paern J., Lopez R.. A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 2010; 38:W695–W699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J.et al.. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011; 7:539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Suyama M., Torrents D., Bork P.. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006; 34:W609–W612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007; 24:1586–1591. [DOI] [PubMed] [Google Scholar]
- 37. Redondo-Salvo S., Fernández-López R., Ruiz R., Vielva L., De Toro M., Rocha E.P.C., Garcillán-Barcia M.P., De La Cruz F.. Pathways for horizontal gene transfer in bacteria revealed by a global map of their plasmids. Nat. Commun. 2020; 11:3602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Redondo-Salvo S., Bartomeus-Peñalver R., Vielva L., Tagg K.A., Webb H.E., Fernández-López R., De La Cruz F.. COPLA, a taxonomic classifier of plasmids. BMC Bioinform. 2021; 22:390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Van Dongen S. Graph clustering via a discrete uncoupling process. SIAM J. Matrix Anal. Appl. 2008; 30:121–141. [Google Scholar]
- 40. Katoh K., Standley D.M.. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 2013; 30:772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Minh B.Q., Schmidt H.A., Chernomor O., Schrempf D., Woodhams M.D., Von Haeseler A., Lanfear R. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 2020; 37:1530–1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Hoang D.T., Chernomor O., Von Haeseler A., Minh B.Q., Vinh L.S.. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 2018; 35:518–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Letunic I., Bork P.. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021; 49:W293–W296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Tria F.D.K., Landan G., Picazo D.R., Dagan T.. Phylogenomic testing of root hypotheses. Genome Biol. Evol. 2023; 15:evad096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Feng Y., Chen Z., Liu S.-L.. Gene decay in Shigella as an incipient stage of host-adaptation. PLoS One. 2011; 6:e27754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Belinky F., Ganguly I., Poliakov E., Yurchenko V., Rogozin I.B.. Analysis of stop codons within prokaryotic protein-coding genes suggests frequent readthrough events. Int. J. Mol. Sci. 2021; 22:1876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Feng Y., Wang Z., Chien K.-Y., Chen H.-L., Liang Y.-H., Hua X., Chiu C.-H.. Pseudo-pseudogenes” in bacterial genomes: proteogenomics reveals a wide but low protein expression of pseudogenes in Salmonella enterica. Nucleic Acids Res. 2022; 50:5158–5170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Wein T., Wang Y., Barz M., Stücker F.T., Hammerschmidt K., Dagan T.. Essential gene acquisition destabilizes plasmid inheritance. PLoS Genet. 2021; 17:e1009656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Lim J.Y., Sheng H., Seo K.S., Park Y.H., Hovde C.J.. Characterization of an Escherichia coli O157:H7 plasmid O157 deletion mutant and its survival and persistence in cattle. Appl. Environ. Microbiol. 2007; 73:2037–2047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Graur D., Zheng Y., Azevedo R.B.R.. An evolutionary classification of genomic function. Genome Biol. Evol. 2015; 7:642–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Plague G.R., Dunbar H.E., Tran P.L., Moran N.A.. Extensive proliferation of transposable elements in heritable bacterial symbionts. J. Bacteriol. 2008; 190:777–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Gray Y.H.M. It takes two transposons to tango: transposable-element-mediated chromosomal rearrangements. Trends Genet. 2000; 16:461–468. [DOI] [PubMed] [Google Scholar]
- 53. Che Y., Yang Y., Xu X., Břinda K., Polz M.F., Hanage W.P., Zhang T.. Conjugative plasmids interact with insertion sequences to shape the horizontal transfer of antimicrobial resistance genes. Proc. Natl. Acad. Sci. U.S.A. 2021; 118:e2008731118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Belda E., Moya A., Bentley S., Silva F.J.. Mobile genetic element proliferation and gene inactivation impact over the genome structure and metabolic capabilities of Sodalis glossinidius, the secondary endosymbiont of tsetse flies. BMC Genom. 2010; 11:449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Arashida H., Odake H., Sugawara M., Noda R., Kakizaki K., Ohkubo S., Mitsui H., Sato S., Minamisawa K.. Evolution of rhizobial symbiosis islands through insertion sequence-mediated deletion and duplication. ISME J. 2022; 16:112–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Partridge S.R., Kwong S.M., Firth N., Jensen S.O.. Mobile genetic elements associated with antimicrobial resistance. Clin. Microbiol. Rev. 2018; 31:e00088-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Soler-Camargo N.C., Silva-Pereira T.T., Zimpel C.K., Camacho M.F., Zelanis A., Aono A.H., Patané J.S., dos Santos A.P., Guimarães A.M.S.. The rate and role of pseudogenes of the mycobacterium tuberculosis complex. Microb. Genom. 2022; 8:e000876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Anand A., Olson C.A., Yang L., Sastry A.V., Catoiu E., Choudhary K.S., Phaneuf P.V., Sandberg T.E., Xu S., Hefner Y.et al.. Pseudogene repair driven by selection pressure applied in experimental evolution. Nat. Microbiol. 2019; 4:386–389. [DOI] [PubMed] [Google Scholar]
- 59. Varesio L.M., Willett J.W., Fiebig A., Crosson S.. A carbonic anhydrase pseudogene sensitizes select Brucella lineages to low CO2 tension. J. Bacteriol. 2019; 201:e00509-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Tazzyman S.J., Bonhoeffer S.. Why there are no essential genes on plasmids. Mol. Biol. Evol. 2014; 32:3079–3088. [DOI] [PubMed] [Google Scholar]
- 61. Wang Y., Batra A., Schulenburg H., Dagan T.. Gene sharing among plasmids and chromosomes reveals barriers for antibiotic resistance gene transfer. Phil. Trans. R. Soc. B. 2022; 377:20200467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Vial L., Hommais F.. Plasmid-chromosome cross-talks. Environ. Microbiol. 2020; 22:540–556. [DOI] [PubMed] [Google Scholar]
- 63. Galibert F., Finan T.M., Long S.R., Pühler A., Abola P., Ampe F., Barloy-Hubler F., Barnett M.J., Becker A., Boistard P.et al.. The composite genome of the legume symbiont sinorhizobium meliloti. Science. 2001; 293:668–672. [DOI] [PubMed] [Google Scholar]
- 64. Makino K. Complete nucleotide sequences of 93-kb and 3.3-kb plasmids of an enterohemorrhagic Escherichia coli O157:H7 derived from Sakai outbreak. DNA Res. 1998; 5:1–9. [DOI] [PubMed] [Google Scholar]
- 65. Lim J.Y., Yoon J., Hovde C.J.. A brief overview of Escherichia coli O157:H7 and its plasmid O157. J. Microbiol. Biotechnol. 2010; 20:5–14. [PMC free article] [PubMed] [Google Scholar]
- 66. Lobato-Márquez D., Molina-García L., Moreno-Córdoba I., García-del Portillo F., Díaz-Orejas R.. Stabilization of the virulence plasmid pSLT of Salmonella Typhimurium by three maintenance systems and its evaluation by using a new stability test. Front. Mol. Biosci. 2016; 3:00066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Feng Y., Liu J., Li Y.-G., Cao F.-L., Johnston R.N., Zhou J., Liu G.-R., Liu S.-L.. Inheritance of the Salmonella virulence plasmids: mostly vertical and rarely horizontal. Infect. Genet. Evol. 2012; 12:1058–1063. [DOI] [PubMed] [Google Scholar]
- 68. Rotger R., Casadesús J.. The virulence plasmids of Salmonella. Int. Microbiol. 1999; 2:177–184. [PubMed] [Google Scholar]
- 69. Ou J.T., Lin M.-Y., Chao H.-L.. Presence of F-like OriT base-pair sequence on the virulence plasmids of Salmonella serovars Gallinarum, Enteritidis, and Typhimurium, but absent in those of Choleraesuis and Dublin. Microb. Pathog. 1994; 17:13–21. [DOI] [PubMed] [Google Scholar]
- 70. Rychlik I., Gregorova D., Hradecka H.. Distribution and function of plasmids in Salmonella enterica. Vet. Microbiol. 2006; 112:1–10. [DOI] [PubMed] [Google Scholar]
- 71. Mebrhatu M.T., Cenens W., Aertsen A.. An overview of the domestication and impact of the Salmonella mobilome. Crit. Rev. Microbiol. 2014; 40:63–75. [DOI] [PubMed] [Google Scholar]
- 72. Dagan T., Blekhman R., Graur D.. The “Domino Theory” of gene death: gradual and mass gene extinction events in three lineages of obligate symbiotic bacterial pathogens. Mol. Biol. Evol. 2006; 23:310–316. [DOI] [PubMed] [Google Scholar]
- 73. Wein T., Wang Y., Hülter N.F., Hammerschmidt K., Dagan T.. Antibiotics interfere with the evolution of plasmid stability. Curr. Biol. 2020; 30:3841–3847. [DOI] [PubMed] [Google Scholar]
- 74. Cole S.T., Eiglmeier K., Parkhill J., James K.D., Thomson N.R., Wheeler P.R., Honoré N., Garnier T., Churcher C., Harris D.et al.. Massive gene decay in the leprosy bacillus. Nature. 2001; 409:1007–1011. [DOI] [PubMed] [Google Scholar]
- 75. McClelland M., Sanderson K.E., Clifton S.W., Latreille P., Porwollik S., Sabo A., Meyer R., Bieri T., Ozersky P., McLellan M.et al.. Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid. Nat. Genet. 2004; 36:1268–1274. [DOI] [PubMed] [Google Scholar]
- 76. Parkhill J., Wren B.W., Thomson N.R., Titball R.W., Holden M.T.G., Prentice M.B., Sebaihia M., James K.D., Churcher C., Mungall K.L.et al.. Genome sequence of Yersinia pestis, the causative agent of plague. Nature. 2001; 413:523–527. [DOI] [PubMed] [Google Scholar]
- 77. Andersson J.O., Andersson S.G.E.. Pseudogenes, junk DNA, and the dynamics of Rickettsia genomes. Mol. Biol. Evol. 2001; 18:829–839. [DOI] [PubMed] [Google Scholar]
- 78. Wei J., Goldberg M.B., Burland V., Venkatesan M.M., Deng W., Fournier G., Mayhew G.F., Plunkett G., Rose D.J., Darling A.et al.. Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T. Infect. Immun. 2003; 71:2775–2786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Bennett G.M., Moran N.A.. Heritable symbiosis: the advantages and perils of an evolutionary rabbit hole. Proc. Natl. Acad. Sci. USA. 2015; 112:10169–10176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Harrison E., Guymer D., Spiers A.J., Paterson S., Brockhurst M.A.. Parallel compensatory evolution stabilizes plasmids across the parasitism-mutualism continuum. Curr. Biol. 2015; 25:2034–2039. [DOI] [PubMed] [Google Scholar]
- 81. San Millan A., Escudero J.A., Gifford D.R., Mazel D., MacLean R.C.. Multicopy plasmids potentiate the evolution of antibiotic resistance in bacteria. Nat. Ecol. Evol. 2016; 1:0010. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article are available in the article and in its online supplementary material.






