Abstract
Horizontal gene transfer (HGT) is a fundamental process in prokaryotic evolution, contributing significantly to diversification and adaptation. HGT is typically facilitated by mobile genetic elements (MGEs), such as conjugative plasmids and phages, which often impose fitness costs on their hosts. However, a considerable number of bacterial genes are involved in defense mechanisms that limit the propagation of MGEs, suggesting they may actively restrict HGT. In our study, we investigated whether defense systems limit HGT by examining the relationship between the HGT rate and the presence of 73 defense systems across 12 bacterial species. We discovered that only six defense systems, three of which were different CRISPR-Cas subtypes, were associated with a reduced gene gain rate at the species evolution scale. Hosts of these defense systems tend to have a smaller pangenome size and fewer phage-related genes compared to genomes without these systems. This suggests that these defense mechanisms inhibit HGT by limiting prophage integration. We hypothesize that the restriction of HGT by defense systems is species-specific and depends on various ecological and genetic factors, including the burden of MGEs and the fitness effect of HGT in bacterial populations.
Keywords: antiphage defense systems, CRISPR-Cas, horizontal gene transfer, mobile genetic elements, prophages
Graphical Abstract

Horizontal gene transfer (HGT) plays a crucial role in the evolution of prokaryotes. This process is facilitated by mobile genetic elements (MGEs) but can be hindered by the activity of anti-MGE defense systems. Our research demonstrates that only a specific subset of these defense systems in certain bacterial taxa substantially reduces HGT.
Introduction
Bacterial viruses, known as bacteriophages (phages, for short), are the most abundant entities in the biosphere (Keen, 2015). They regularly attack and predate on bacterial populations across different ecological settings with an estimated rate of infection per second in oceans alone on the order of 1023 (Suttle, 2007; Mushegian, 2020). To counteract phages and other parasitic mobile elements, bacteria evolved a wide range of defense systems with various molecular mechanisms of action (Doron et al., 2018; Gao et al., 2020; Bernheim et al., 2021; Millman et al., 2022; Georjon and Bernheim, 2023). These include CRISPR-Cas systems, which provide adaptive immunity by storing information about past encounters with MGE (Makarova et al., 2020), restriction-modification (RM) systems that degrade foreign genetic material based on specific molecular patterns (Wilson, 1991), abortive infection mechanisms that limit the spread of phages in the bacterial population by inducing the suicide of infected cells (Lopatina et al., 2020), and multiple others. Individual bacterial genomes typically encode several diverse defense systems, and the repertoire of defense mechanisms can differ even among closely related strains (Bernheim and Sorek, 2020; Tesson et al., 2022). Consequently, defense systems demonstrate high mobility, with high rates of gene gain and loss on a short evolutionary scale (Makarova et al., 2013; Puigbo et al., 2017).
Although defense systems are essential for protection against phages, and to a lesser extent, against other invasive MGEs, such as integrative conjugative elements (ICE) and plasmids (Deep et al., 2022; Jaskolska et al., 2022; Weiss et al., 2023), they also come with associated fitness costs to the hosts. One form of such costs impedes lysogenic conversion and gain of beneficial genes that reside in MGEs. These include genes that equip bacteria with the capability to adapt to different ecological niches (Kelleher et al., 2017; Davray et al., 2021; Kieft et al., 2021), and resist environmental stress (Lopatkin et al., 2017; Jahn et al., 2019). For example, the presence of the CRISPR-Cas system in Enterococcus faecalis shows a significant inverse correlation with the resistance to different antibiotics (Palmer and Gilmore, 2010). Moreover, multiple experimental studies have demonstrated the capability of CRISPR-Cas systems to limit horizontal gene transfer (HGT) (Marraffini and Sontheimer, 2008; Bikard et al., 2012). However, broader comparative genomic analyses yielded conflicting conclusions on the inhibition of HGT by CRISPR-Cas on a larger evolutionary scale (Gophna et al., 2015; Shehreen et al., 2019; Wheatley and MacLean, 2021). Furthermore, the potential interference of other defense systems with HGT has not been comprehensively analyzed.
In this work, we examined the association between the presence of various defense systems and the rates of gene gain in a set of 12 bacterial species. Our results reveal a significant association with increased gene gain rate for 15 defense systems, whereas 6 systems were found to be significantly associated with reduced gene gain rates. However, we show that for the 15 defense systems associated with increased gene gain rates, this signal is likely a byproduct of their location within large MGEs. Conversely, 3 of the 6 defense systems that are negatively correlated with the gene gain are CRISPR-Cas variants that tend to inhibit gene gain by reducing prophage integration.
Results
Only a few defense systems are associated with a reduced gene gain rate
We analyzed a dereplicated set of 2,546 complete genomes from 12 bacterial species. Each species was represented by a minimum of 118 genomes, with the count ranging from 118 for Streptococcus pyogenes to 430 for Escherichia coli (Supplementary Table S1). Altogether, we identified 835 distinct defense systems, but to make the subsequent analysis informative, we focused on the 73 systems that were found to be encoded in at least 20% of the genomes in a species, but not more than 80% (Supplementary Figure S1).
To estimate the gene gain rate for each species, we mapped the ancestral state of every orthologous group using GLOOME (Cohen et al., 2010) onto the individual species trees, which were computed from the concatenated alignments of the core genes (see Experimental Procedures for the details). GLOOME estimates the probability of the presence of each orthologous group in the ancestral nodes, taking a value from 0 to 1. We then calculated changes in the probabilities across all individual ancestral states from the root to the tips of phylogenetic trees. The positive differences were retained and summed up across all orthologous groups to obtain the overall gene gain estimation for every branch in the phylogenies. Additionally, the combined gene gain estimates were normalized by their respective branch lengths. For the defense systems, we also conducted the ancestral state reconstruction using GLOOME (Cohen et al., 2010), and denoted the state of every branch in phylogenetic trees as the average of the states between its ancestral and descendant nodes. To determine whether there was a connection between individual defense systems and gene gain rate, we calculated Spearman’s correlation coefficient between them (Supplementary Figure S2). The computational pipeline developed for this analysis is shown in Figure 1 and a detailed numerical example is illustrated in Supplementary Figure S3.
Figure 1. The pipeline used to calculate the association between the presence of defense systems and gene gain rate.

In the initial step, the gene gain rate is calculated for each branch of the phylogenetic tree. Subsequently, the probabilistic state of a defense system is mapped onto the phylogenetic tree. In the final step, the Spearman’s correlation coefficient between the obtained estimates is computed.
Of the 73 analyzed defense systems, 6 were found to be significantly associated with a reduced gene gain rate (Spearman’s rho < −0.3; permutation test p-value < 0.0001), whereas 15 were significantly associated with an increased gene gain rate (Spearman’s rho > 0.3; permutation test p-value < 0.0001) (Figure 2, Supplementary Table S2). Notably, among the 6 defense systems associated with a reduced gene gain rate, 3 were CRISPR-Cas variants (out of the 9 CRISPR-Cas variants included among the 73 analyzed defense systems). Although these are not large numbers, the over-representation of CRISPR systems among those associated with increased gene gain was statistically significant (Fisher’s exact test, p-value = 0.0221).
Figure 2. Heatmap of the correlation coefficients between defense systems and gene gain rate.

Positive correlation coefficients are shown with red hues, and negative correlation coefficients are shown with blue hues. Significant Associations () are highlighted with black rectangles.
Given that HGT plays a key role in the expansion of the accessory genome in bacteria (Ochman et al., 2000; Nowell et al., 2014; Props et al., 2019), we hypothesized that defense systems associated with a reduced gene gain rate would be present in pangenomes of a relatively smaller size, and vice versa. To test this hypothesis, we counted the number of protein-coding genes and conducted pairwise comparisons of the genomes containing and lacking the respective defense systems. Indeed, we found that defense systems that are associated with a reduced gene gain rate tend to be present in genomes that encode fewer proteins, and in 3 of the cases, this difference was statistically significant (Mann-Whitney U test, p-value < 0.05) (Figure 3). To account for the phylogenetic signal, we also performed phylogenetic logistic regression (Ives and Garland, 2010), and in all 3 cases, the inverse relationship remained significant (p-value < 0.05). In contrast, we expected that genomes carrying the defense systems associated with an elevated gene gain rate would contain more protein-coding genes. Indeed, in each of the 15 such cases, genomes that harbored these defense systems had significantly larger pangenome size compared to genomes lacking them (Mann-Whitney U test, p-value < 0.05) (Supplementary Figure S4), and the statistical significance persisted after applying the phylogenetic correction (p-value < 0.05).
Figure 3. Relationship between the 6 defense systems associated with a reduced gene gain rate and the number of protein-coding genes in genomes.

Boxplots show median values that are bound by the first and third quartiles. Whiskers extended outside boxplots are within the 1.5 * interquartile range. Black dots outside the whiskers denote outliers. Red asterisks show significant differences between boxplots (Mann-Whitney U test, p-value < 0.05).
We observed that 3 CRISPR-Cas systems, including CRISPR-Cas-I-F in Pseudomonas aeruginosa, were significantly associated with a reduced gene gain rate, corroborating previous work (Wheatley and MacLean, 2021). By contrast, 6 CRISPR-Cas systems, including CRISPR-Cas-I-E in Escherichia coli and Salmonella enterica, exhibited no significant link with the gene gain rate, also in agreement with previous findings (Gophna et al., 2015; Shariat et al., 2015; Xue and Sashital, 2019) (Figure 2). As a proxy for the CRISPR-Cas activity, we calculated the spacer turnover rate in the respective species (see Experimental Procedures for the details). Active CRISPR-Cas systems tend to continuously acquire new spacers (Paez-Espino et al., 2013), whereas old spacers are prone to be lost due to the inactivity and pronounced deletion bias in bacteria (Mira et al., 2001). Thus, more active CRISPR-Cas systems are likely to have a higher spacer turnover rate. Indeed, although the signal was not particularly robust, we found that species where CRISPR-Cas was associated with reduced gene gain rate tended to have a higher spacer turnover rate (Supplementary Figure S5) (Supplementary Table S3).
Defense systems reduce the gene gain rate by limiting the acquisition of phage-like genes
Given that the primary role of defense systems in bacteria is to limit the propagation of phage infection and, to a lesser extent, the spread of ICEs and plasmids, we hypothesized that defense systems that are associated with a reduced gene gain rate actively hinder the spread of MGEs. Consequently, genomes that encompass such defense systems can be expected to harbor fewer phage-related genes than other genomes. Indeed, our findings show that the genomes carrying 4 of the 6 defense systems that are associated with a reduced gene gain rate, including all 3 CRISPR-Cas systems, contained a significantly lower number of phage-related genes compared to genomes that lack such defense systems (Mann-Whitney U test, p-value <0.05) (Figure 4). These results are also supported by the phylogenetic logistic regression (p-value < 0.05). We conjectured that these CRISPR-Cas systems play a direct role in restricting the acquisition of phage-related genes, that is, curtail prophage integration. To test this hypothesis, we investigated whether the spacers in these CRISPR-Cas systems targeted phage genes present in the respective species. For that purpose, we pooled all detected phage-related genes encoded in genomes of the respective species and found that spacers target a range of 1.48% to 6.71% of all phage-related genes (Supplementary Figure S6A). Notably, phage-related genes targeted by spacers are predominantly found in genomes lacking the 3 CRISPR-Cas systems associated with a reduced gene gain rate (Supplementary Figure S6B). These observations are compatible with the possibility that these CRISPR-Cas systems actively inhibit the acquisition of prophages.
Figure 4. Relationship between the 6 defense systems associated with a reduced gene gain rate and the number of phage-related genes in genomes.

Boxplots show median values that are bound by the first and third quartiles. Whiskers extended outside boxplots are within the 1.5 * interquartile range. Black dots outside the whiskers denote outliers. Red asterisks show significant differences between boxplots (Mann-Whitney U test, p-value < 0.05).
We also analyzed the number of phage-related genes in genomes that harbored defense systems associated with increased gene gain rate. The results indicated that genomes carrying such defense systems contained more phage-related genes than genomes without these systems, and in all 15 cases, that difference was statistically significant (Supplementary Figure S7), as supported by phylogenetic logistic regression (p-value < 0.05).
Defense systems are linked to mobile genetic elements
Because defense systems tend to be colocalized with MGEs in prokaryotic genomes (Makarova et al., 2011), we hypothesized that the connection between some defense systems and an increased gene gain rate is a byproduct of horizontal transfer of large ICEs and plasmids carrying these defense systems. To address this possibility, we conducted phylogenetic profiling of 47 defense systems whose constituent genes form orthologous groups across the genomes in examined species. To minimize false positive associations due to the phylogenetic signal, we discarded clade-specific defense systems and the corresponding groups of orthologous genes (see Experimental Procedures for the details). For every pair of orthologous groups, we calculated the observed and expected counts of occurrence in a genome and evaluated the significance using the Bonferroni-corrected binomial test (Whelan et al., 2020). If defense systems and co-occurring orthologous groups shared evolutionary and functional connections, they would be expected to co-localize in bacterial genomes. We calculated the median genomic distance between defense systems and co-occurring orthologous gene groups and compared them with the distribution expected by chance that was generated by random sampling of the orthologous gene groups encoded in the main chromosome. Indeed, we found that of the 782 identified associations, 725 orthologous gene groups substantially colocalized with the defense systems (p-value < 0.1).
Furthermore, our findings indicate that defense systems associated with the increased gene gain rate co-occurred with significantly more orthologous gene groups compared to other defense systems (Mann-Whitney U test, p-value < 0.05). A closer examination of the orthologous gene groups over-represented in the genomic neighborhoods of defense systems suggested that these neighborhoods predominantly represented ICEs and large extrachromosomal plasmids, as exemplified by JukAB, RM-Type-IV and Wadjet systems in Pseudomonas aeruginosa (Figure 5A). Consequently, the association between these defense systems and an increased gene gain rate could be attributed to the horizontal movement of MGEs that harbor such defense systems along with the co-occurring genes. Moreover, in agreement with prior findings (Makarova et al., 2011; Botelho, 2023), 19 of the 26 defense systems with at least one associated orthologous gene group were linked to genes typical of MGEs including phages, ICEs, transposons, and integrons. Except for CRISPR-Cas-I-E in Klebsiella pneumoniae, CRISPR-Cas systems shared no phylogenetic co-occurrence with other orthologous gene groups (Figure 5B).
Figure 5. Gene neighborhoods of (A) JukAB, RM-Type IV, and Wadjet in Pseudomonas aeruginosa, and (B) CRISPR-Cas-I-E in Klebsiella pneumoniae in representative genomes.

Phylogenetic trees are scaled with the number of substitutions per site, and red branches denote clades that contain discussed defense systems.
Discussion
Experiments have clearly demonstrated that bacterial defense systems can actively limit the propagation of MGE such as phages and plasmids under laboratory conditions (Marraffini and Sontheimer, 2008; Dupuis et al., 2013; Deep et al., 2022). However, whether on the larger evolutionary scale, these systems provide any substantial barrier to horizontal gene acquisition, remains a conflicting topic (Gophna et al., 2015; Wheatley and MacLean, 2021). Our results indicate that the majority of the defense systems are not associated with either an increased or a decreased rate of gene acquisition. However, a minority of the defense systems are associated with an elevated gene gain rate, and an even smaller subset is associated with a reduced gene gain rate.
While certain CRISPR-Cas systems, Gabija and RM-Type II are associated with a reduced gene gain rate in some species, in other species, they do not exhibit such association. Thus, the effects of defense systems on HGT appear to be strongly lineage-specific and depend on additional factors. Integrated phages and plasmids can impose various metabolic and other fitness costs on their hosts contingent upon ecological and genetic contexts (Alonso-Del Valle et al., 2021; Rendueles et al., 2023). Hence, there is likely a differential selection pressure on the same types of defense systems in different species to hinder the spread of MGEs, depending on the associated costs of the latter. Perhaps, more generally and more importantly, given that HGT is the major route of acquisition of novel traits by bacteria, the active restriction of the gene flow can compromise bacterial adaptation to diverse, fluctuating environmental conditions (Woods et al., 2020; Arnold et al., 2022). In this scenario, the costs associated with defense systems can outweigh their benefits, leading to a reduction in their activity or even complete inactivation and subsequent loss. Furthermore, the extent of HGT varies among different bacterial species and strains depending on the ecological conditions (Smillie et al., 2011; Groussin et al., 2021). For example, niche specialists that occupy stable environmental habitats tend to have closed pangenomes and lower genetic diversity relative to generalists with open pangenomes (Brockhurst et al., 2019). As a result, for some bacteria, ecological barriers to HGT could be so pronounced that the restriction of HGT by defense systems becomes limited and could be neither statistically significant nor biologically relevant.
Bacteria rely on different molecular strategies, including innate immunity, adaptive immunity, and abortive infection to combat infection by phages and restrict the invasion of other costly MGEs, such as conjugative plasmids (Makarova et al., 2021). Such multilayered defense organization provides cells with an enhanced capability to withstand assaults by diverse MGEs. Abortive infection is typically used by bacteria as a last-resort defense strategy during the final stages of phage reproduction when the cell lysis becomes imminent (Lopatina et al., 2020; Rousset and Sorek, 2023). Therefore, integration of prophages or uptake of conjugative plasmids are not likely to trigger this type of immune response, and consequently, abortive infection appears unlikely to substantially interfere with HGT facilitated by phages and other MGEs. Indeed, among the 6 defense systems we found to be associated with a reduced gene gain rate, none is known to be involved in abortive infection response. By contrast, 3 of these defense systems are CRISPR-Cas variants from Pseudomonas aeruginosa, Klebsiella pneumoniae, and Streptococcus pyogenes that appear to restrict gene gains, primarily, by interfering with prophage integration. On the other hand, we found no evidence that CRISPR-Cas-I-E in Escherichia coli and Salmonella enterica impacts gene acquisition on the species level, which is consistent with prior work demonstrating the low activity of these systems (Westra et al., 2010; Shariat et al., 2015).
Our observations of a positive association with some defense systems with HGT rate seem to be explained away by their hijacking of MGE for. Nevertheless, bona fide stimulation of HGT by defense systems cannot be ruled out. For example, in Petrobacterium atrosepticum, CRISPR-Cas-I-F can promote HGT through generalized transduction by boosting the survival rate of cells that receive transduced genetic material during infection, while the defense system inhibits lytic phages (Watson et al., 2018). Such a mechanism could represent an adaptation to facilitate the gene flow while maintaining an active defense system that deters deleterious MGEs. Consequently, the interplay of population-level dynamics among various MGEs and bacteria that harbor defense systems can determine the varying effects of defense systems on HGT.
Experimental Procedures
Generation of the dataset
All bacterial assemblies with the ‘complete genome’ assembly level were downloaded from the RefSeq database (assessed October 20, 2023) (O’Leary et al., 2016). 30,177 assemblies with a completeness greater than 90% and contamination less than 5%, as determined by CheckM, were retained for subsequent analyses (Parks et al., 2015). The taxonomy of each retained assembly was determined either by assessing the GTDB taxonomy database or via de novo prediction using the gtdb-tk v2.3.2 (Parks et al., 2018; Chaumeil et al., 2019). Average Nucleotide Identity (ANI) values were calculated using the fastANI v1.33 (Jain et al., 2018) and genomes were dereplicated using the ANI cutoff of 99.9% via a single-linkage clustering algorithm (due to the substantially larger number of available genomes, for E. coli and K. pneumoniae, the ANI threshold was reduced to 99.5% and 99.8%, respectively). Fifteen species represented by at least 100 dereplicated genomes were retained.
Genomes from the retained species were annotated using Prokka v1.14.6 with default parameters (Seemann, 2014). In each retained genome, defense systems were predicted using DefenseFinder (last updated November 2023) (Abby et al., 2014; Tesson et al., 2022). Twelve species that encoded a minimum of two defense systems found in at least 20% of genomes, but not more than 80%, were retained for subsequent analyses (Supplementary Table S1).
Identification of orthologous gene groups and reconstruction of reference phylogenetic trees
For each retained species, orthologous groups were predicted using the Panaroo v1.3.4 with the sensitive mode (Tonkin-Hill et al., 2020). Protein sequences encoded by genes from orthologous groups that were represented in at least 95% of the genomes (defined as ‘core genes’) were aligned using MAFFT v7.520 (Katoh and Standley, 2013). Phylogenetically informative alignments were filtered using BMGE (Block Mapping and Gathering with Entropy) as implemented in Panaroo (Criscuolo and Gribaldo, 2010). The retained alignments were concatenated into a single superalignment. The super alignments were trimmed using ClipKIT v2.1.1 to keep only parsimony-informative sites (Steenwyk et al., 2020), and phylogenetic trees were reconstructed using IQ-Tree v2.2.5 with the GTR+I+G model (Minh et al., 2020). The reconstructed phylogenetic trees were rooted using the minimum ancestral deviation method (Tria et al., 2017). Phylogenetic trees were visualized in iToL v6 (Letunic and Bork, 2021).
Correlation between the gene gain rate and defense system presence
The presence-absence matrix of orthologous gene groups was transformed into binary sequences in FASTA format that was used as input for the locally installed GLOOME software (Cohen et al., 2010). The reconstructed ancestral states and data for the original taxa were mapped onto the reference phylogenetic trees. For each orthologous group, the changes for every branch were calculated by subtracting each ancestral state from the descendant state and keeping the results only if the difference was positive. All retained changes across all orthologous groups were combined into a single number and normalized by their respective branch lengths. Similarly, the ancestral states were reconstructed for each defense system, and for every branch in the phylogenetic tree, the state of each defense system was represented as an average between its ancestral and descendant states. Then, the Spearman’s correlation coefficient was calculated for the gene gain rate and the presence of each defense system. To calculate the p-value, values of the gene gain rate and the state of defense systems were randomly shuffled, and Spearman’s correlation coefficient was calculated for the shuffled dataset. This procedure was repeated 10,000 times to generate a null distribution. To minimize the false positives, the association between gene gain rate and defense system was deemed significant if the absolute value of Spearman’s correlation coefficient was greater than 0.3, and the p-value was less than 0.0001. This threshold for Spearman’s correlation coefficient was chosen based on the observation that this value was never exceeded in the randomly permuted datasets.
Computation of the spacer turnover rate
Genomes encompassing CRISPR-Cas systems were searched for spacers using CRISPRIdentify (Mitrofanov et al., 2021). Only “bona-fide” and “possible” candidates were retained for further analyses. Additionally, only spacers located in the same genetic neighborhood (within a 10,000 bp range) as the analyzed CRISPR-Cas systems were kept. For each species, an all-vs-all search of spacers was performed using MMseqs2 under the high sensitivity mode, and the query and subject overlap of at least 80% (e-value < 0.01) (Steinegger and Soding, 2017). For every pair of taxa with a pairwise phylogenetic distance of less than 1, the spacer turnover rate was calculated as:
Prediction of phage-related genes and search for corresponding spacers
HMM models were downloaded from the pVOG database (Grazziotin et al., 2017) and merged into a single HMM database using hmmpress. For each species, predicted proteins were clustered using MMseqs2 with query and subject coverage of at least 60% and an e-value of less than 0.001 (Steinegger and Soding, 2017). A representative from each cluster was searched against the pVOG database using hmmscan (e-value < 1E-10, database size = 100,000) (Eddy, 2011). If the representative had a corresponding hit in the pVOG database, all proteins in the cluster were labeled as “phage-related proteins”.
To estimate the fraction of ‘phage-related proteins’ that are targeted by CRISPR-Cas systems, nucleotide sequences of phage-related genes were extracted from the FASTA file. The spacers were used as queries in an MMseqs2 search against phage-related genes with a minimum sequence identity of 95% and a minimum spacer coverage of 90%.
Phylogenetic profiling of defense systems
To minimize spurious associations due to the phylogenetic signal, for each orthologous group, the Fritz and Purvis’ D-value (with 1,000 permutations) was calculated using the caper package (Fritz and Purvis, 2010; Orme et al., 2013). Only orthologous groups with D-values greater than −0.2 were retained for subsequent analyses. Furthermore, orthologous groups that were present in less than 10% of the genomes or more than 90% of the genomes were discarded. Phylogenetic profiling was conducted using Coinfinder, and p-values were adjusted via the Bonferroni correction (Whelan et al., 2020). Only defense systems with D-values greater than −0.2, were retained. The orthologous group was considered associated with the multigene defense system if it showed a significant association with all genes of this defense system. The significantly associated orthologous groups were annotated using the CDD database v3.2 (Lu et al., 2020), and BLASTP search against the UniProtKB database (e-value < 0.1) (Altschul et al., 1990; UniProt Consortium, 2021).
Supplementary Material
Acknowledgments
The authors’ research is funded by the Intramural Research Program of the National Institutes of Health (National Library of Medicine).
Footnotes
Conflict of interest
The authors declare no conflict of interest.
Data availability statement
Derived data and custom scripts are deposited at Zenodo: https://zenodo.org/doi/10.5281/zenodo.10637201. The IDs of the analyzed assemblies are provided in Supplementary Table 1 and are publicly available in the NCBI RefSeq database.
References
- Abby SS, Neron B, Menager H, Touchon M, and Rocha EP (2014) MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems. PLoS One 9: e110726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alonso-Del Valle A, Leon-Sampedro R, Rodriguez-Beltran J, DelaFuente J, Hernandez-Garcia M, Ruiz-Garbajosa P et al. (2021) Variability of plasmid fitness effects contributes to plasmid persistence in bacterial communities. Nat Commun 12: 2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410. [DOI] [PubMed] [Google Scholar]
- Arnold BJ, Huang IT, and Hanage WP (2022) Horizontal gene transfer and adaptive evolution in bacteria. Nat Rev Microbiol 20: 206–218. [DOI] [PubMed] [Google Scholar]
- Bernheim A, and Sorek R (2020) The pan-immune system of bacteria: antiviral defence as a community resource. Nat Rev Microbiol 18: 113–119. [DOI] [PubMed] [Google Scholar]
- Bernheim A, Millman A, Ofir G, Meitav G, Avraham C, Shomar H et al. (2021) Prokaryotic viperins produce diverse antiviral molecules. Nature 589: 120–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bikard D, Hatoum-Aslan A, Mucida D, and Marraffini LA (2012) CRISPR interference can prevent natural transformation and virulence acquisition during in vivo bacterial infection. Cell Host Microbe 12: 177–186. [DOI] [PubMed] [Google Scholar]
- Botelho J (2023) Defense systems are pervasive across chromosomally integrated mobile genetic elements and are inversely correlated to virulence and antimicrobial resistance. Nucleic Acids Res 51: 4385–4397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brockhurst MA, Harrison E, Hall JPJ, Richards T, McNally A, and MacLean C (2019) The Ecology and Evolution of Pangenomes. Curr Biol 29: R1094–R1103. [DOI] [PubMed] [Google Scholar]
- Chaumeil PA, Mussig AJ, Hugenholtz P, and Parks DH (2019) GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen O, Ashkenazy H, Belinky F, Huchon D, and Pupko T (2010) GLOOME: gain loss mapping engine. Bioinformatics 26: 2914–2915. [DOI] [PubMed] [Google Scholar]
- Criscuolo A, and Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10: 210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davray D, Deo D, and Kulkarni R (2021) Plasmids encode niche-specific traits in Lactobacillaceae. Microb Genom 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deep A, Gu Y, Gao YQ, Ego KM, Herzik MA Jr., Zhou H, and Corbett KD (2022) The SMC-family Wadjet complex protects bacteria from plasmid transformation by recognition and cleavage of closed-circular DNA. Mol Cell 82: 4145–4159 e4147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doron S, Melamed S, Ofir G, Leavitt A, Lopatina A, Keren M et al. (2018) Systematic discovery of antiphage defense systems in the microbial pangenome. Science 359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dupuis ME, Villion M, Magadan AH, and Moineau S (2013) CRISPR-Cas and restriction-modification systems are compatible and increase phage resistance. Nat Commun 4: 2087. [DOI] [PubMed] [Google Scholar]
- Eddy SR (2011) Accelerated Profile HMM Searches. PLoS Comput Biol 7: e1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fritz SA, and Purvis A (2010) Selectivity in mammalian extinction risk and threat types: a new measure of phylogenetic signal strength in binary traits. Conserv Biol 24: 1042–1051. [DOI] [PubMed] [Google Scholar]
- Gao L, Altae-Tran H, Bohning F, Makarova KS, Segel M, Schmid-Burgk JL et al. (2020) Diverse enzymatic activities mediate antiviral immunity in prokaryotes. Science 369: 1077–1084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Georjon H, and Bernheim A (2023) The highly diverse antiphage defence systems of bacteria. Nat Rev Microbiol 21: 686–700. [DOI] [PubMed] [Google Scholar]
- Gophna U, Kristensen DM, Wolf YI, Popa O, Drevet C, and Koonin EV (2015) No evidence of inhibition of horizontal gene transfer by CRISPR-Cas on evolutionary timescales. ISME J 9: 2021–2027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grazziotin AL, Koonin EV, and Kristensen DM (2017) Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation. Nucleic Acids Res 45: D491–D498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Groussin M, Poyet M, Sistiaga A, Kearney SM, Moniz K, Noel M et al. (2021) Elevated rates of horizontal gene transfer in the industrialized human microbiome. Cell 184: 2053–2067 e2018. [DOI] [PubMed] [Google Scholar]
- Ives AR, and Garland T Jr. (2010) Phylogenetic logistic regression for binary dependent variables. Syst Biol 59: 9–26. [DOI] [PubMed] [Google Scholar]
- Jahn MT, Arkhipova K, Markert SM, Stigloher C, Lachnit T, Pita L et al. (2019) A Phage Protein Aids Bacterial Symbionts in Eukaryote Immune Evasion. Cell Host Microbe 26: 542–550 e545. [DOI] [PubMed] [Google Scholar]
- Jain C, Rodriguez RL, Phillippy AM, Konstantinidis KT, and Aluru S (2018) High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9: 5114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jaskolska M, Adams DW, and Blokesch M (2022) Two defence systems eliminate plasmids from seventh pandemic Vibrio cholerae. Nature 604: 323–329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katoh K, and Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30: 772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keen EC (2015) A century of phage research: bacteriophages and the shaping of modern biology. Bioessays 37: 6–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kelleher P, Bottacini F, Mahony J, Kilcawley KN, and van Sinderen D (2017) Comparative and functional genomics of the Lactococcus lactis taxon; insights into evolution and niche adaptation. BMC Genomics 18: 267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kieft K, Zhou Z, Anderson RE, Buchan A, Campbell BJ, Hallam SJ et al. (2021) Ecology of inorganic sulfur auxiliary metabolism in widespread bacteriophages. Nat Commun 12: 3503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Letunic I, and Bork P (2021) Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49: W293–W296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lopatina A, Tal N, and Sorek R (2020) Abortive Infection: Bacterial Suicide as an Antiviral Immune Strategy. Annu Rev Virol 7: 371–384. [DOI] [PubMed] [Google Scholar]
- Lopatkin AJ, Meredith HR, Srimani JK, Pfeiffer C, Durrett R, and You L (2017) Persistence and reversal of plasmid-mediated antibiotic resistance. Nat Commun 8: 1689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR et al. (2020) CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res 48: D265–D268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makarova KS, Wolf YI, and Koonin EV (2013) Comparative genomics of defense systems in archaea and bacteria. Nucleic Acids Res 41: 4360–4377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makarova KS, Wolf YI, and Koonin EV (2021) Defense Against Viruses and Other Genetic Parasites in Prokaryotes. In Encyclopedia of Virology (Fourth Edition). Bamford DH, and Zuckerman M (eds). Oxford: Academic Press, pp. 606–616. [Google Scholar]
- Makarova KS, Wolf YI, Snir S, and Koonin EV (2011) Defense islands in bacterial and archaeal genomes and prediction of novel defense systems. J Bacteriol 193: 6039–6056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makarova KS, Wolf YI, Iranzo J, Shmakov SA, Alkhnbashi OS, Brouns SJJ et al. (2020) Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants. Nat Rev Microbiol 18: 67–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marraffini LA, and Sontheimer EJ (2008) CRISPR interference limits horizontal gene transfer in staphylococci by targeting DNA. Science 322: 1843–1845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Millman A, Melamed S, Leavitt A, Doron S, Bernheim A, Hor J et al. (2022) An expanded arsenal of immune systems that protect bacteria from phages. Cell Host Microbe 30: 1556–1569 e1555. [DOI] [PubMed] [Google Scholar]
- Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, and Lanfear R (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37: 1530–1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mira A, Ochman H, and Moran NA (2001) Deletional bias and the evolution of bacterial genomes. Trends Genet 17: 589–596. [DOI] [PubMed] [Google Scholar]
- Mitrofanov A, Alkhnbashi OS, Shmakov SA, Makarova KS, Koonin EV, and Backofen R (2021) CRISPRidentify: identification of CRISPR arrays using machine learning approach. Nucleic Acids Res 49: e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mushegian AR (2020) Are There 10(31) Virus Particles on Earth, or More, or Fewer? J Bacteriol 202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nowell RW, Green S, Laue BE, and Sharp PM (2014) The extent of genome flux and its role in the differentiation of bacterial lineages. Genome Biol Evol 6: 1514–1529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R et al. (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44: D733–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ochman H, Lawrence JG, and Groisman EA (2000) Lateral gene transfer and the nature of bacterial innovation. Nature 405: 299–304. [DOI] [PubMed] [Google Scholar]
- Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, Isaac N, and Pearse W (2013) The caper package: comparative analysis of phylogenetics and evolution in R. R package version 5: 1–36. [Google Scholar]
- Paez-Espino D, Morovic W, Sun CL, Thomas BC, Ueda K, Stahl B et al. (2013) Strong bias in the bacterial CRISPR elements that confer immunity to phage. Nat Commun 4: 1430. [DOI] [PubMed] [Google Scholar]
- Palmer KL, and Gilmore MS (2010) Multidrug-resistant enterococci lack CRISPR-cas. mBio 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, and Tyson GW (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25: 1043–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA, and Hugenholtz P (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36: 996–1004. [DOI] [PubMed] [Google Scholar]
- Props R, Monsieurs P, Vandamme P, Leys N, Denef VJ, and Boon N (2019) Gene Expansion and Positive Selection as Bacterial Adaptations to Oligotrophic Conditions. mSphere 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Puigbo P, Makarova KS, Kristensen DM, Wolf YI, and Koonin EV (2017) Reconstruction of the evolution of microbial defense systems. BMC Evol Biol 17: 94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rendueles O, de Sousa JAM, and Rocha EPC (2023) Competition between lysogenic and sensitive bacteria is determined by the fitness costs of the different emerging phage-resistance strategies. Elife 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rousset F, and Sorek R (2023) The evolutionary success of regulated cell death in bacterial immunity. Curr Opin Microbiol 74: 102312. [DOI] [PubMed] [Google Scholar]
- Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30: 2068–2069. [DOI] [PubMed] [Google Scholar]
- Shariat N, Timme RE, Pettengill JB, Barrangou R, and Dudley EG (2015) Characterization and evolution of Salmonella CRISPR-Cas systems. Microbiology (Reading) 161: 374–386. [DOI] [PubMed] [Google Scholar]
- Shehreen S, Chyou TY, Fineran PC, and Brown CM (2019) Genome-wide correlation analysis suggests different roles of CRISPR-Cas systems in the acquisition of antibiotic resistance genes in diverse species. Philos Trans R Soc Lond B Biol Sci 374: 20180384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smillie CS, Smith MB, Friedman J, Cordero OX, David LA, and Alm EJ (2011) Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480: 241–244. [DOI] [PubMed] [Google Scholar]
- Steenwyk JL, Buida TJ 3rd, Li Y, Shen XX, and Rokas A (2020) ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference. PLoS Biol 18: e3001007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steinegger M, and Soding J (2017) MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35: 1026–1028. [DOI] [PubMed] [Google Scholar]
- Suttle CA (2007) Marine viruses--major players in the global ecosystem. Nat Rev Microbiol 5: 801–812. [DOI] [PubMed] [Google Scholar]
- Tesson F, Herve A, Mordret E, Touchon M, d’Humieres C, Cury J, and Bernheim A (2022) Systematic and quantitative view of the antiviral arsenal of prokaryotes. Nat Commun 13: 2561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G, Lees JA et al. (2020) Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol 21: 180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tria FDK, Landan G, and Dagan T (2017) Phylogenetic rooting using minimal ancestor deviation. Nat Ecol Evol 1: 193. [DOI] [PubMed] [Google Scholar]
- UniProt Consortium (2021) UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49: D480–D489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watson BNJ, Staals RHJ, and Fineran PC (2018) CRISPR-Cas-Mediated Phage Resistance Enhances Horizontal Gene Transfer by Transduction. mBio 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weiss M, Giacomelli G, Assaya MB, Grundt F, Haouz A, Peng F et al. (2023) The MksG nuclease is the executing part of the bacterial plasmid defense system MksBEFG. Nucleic Acids Res 51: 3288–3306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Westra ER, Pul U, Heidrich N, Jore MM, Lundgren M, Stratmann T et al. (2010) H-NS-mediated repression of CRISPR-based immunity in Escherichia coli K12 can be relieved by the transcription activator LeuO. Mol Microbiol 77: 1380–1393. [DOI] [PubMed] [Google Scholar]
- Wheatley RM, and MacLean RC (2021) CRISPR-Cas systems restrict horizontal gene transfer in Pseudomonas aeruginosa. ISME J 15: 1420–1433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whelan FJ, Rusilowicz M, and McInerney JO (2020) Coinfinder: detecting significant associations and dissociations in pangenomes. Microb Genom 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson GG (1991) Organization of restriction-modification systems. Nucleic Acids Res 19: 2539–2566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woods LC, Gorrell RJ, Taylor F, Connallon T, Kwok T, and McDonald MJ (2020) Horizontal gene transfer potentiates adaptation by reducing selective constraints on the spread of genetic variation. Proc Natl Acad Sci U S A 117: 26868–26875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xue C, and Sashital DG (2019) Mechanisms of Type I-E and I-F CRISPR-Cas Systems in Enterobacteriaceae. EcoSal Plus 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Derived data and custom scripts are deposited at Zenodo: https://zenodo.org/doi/10.5281/zenodo.10637201. The IDs of the analyzed assemblies are provided in Supplementary Table 1 and are publicly available in the NCBI RefSeq database.
