Abstract
Many prokaryotes encode CRISPR-Cas systems as immune protection against mobile genetic elements (MGEs), yet a number of MGEs also harbor CRISPR-Cas components. With a few exceptions, CRISPR-Cas loci encoded on MGEs are uncharted and a comprehensive analysis of their distribution, prevalence, diversity, and function is lacking. Here, we systematically investigated CRISPR-Cas loci across the largest curated collection of natural bacterial and archaeal plasmids. CRISPR-Cas loci are widely but heterogeneously distributed across plasmids and, in comparison to host chromosomes, their mean prevalence per Mbp is higher and their distribution is distinct. Furthermore, the spacer content of plasmid CRISPRs exhibits a strong targeting bias towards other plasmids, while chromosomal arrays are enriched with virus-targeting spacers. These contrasting targeting preferences highlight the genetic independence of plasmids and suggest a major role for mediating plasmid-plasmid conflicts. Altogether, CRISPR-Cas are frequent accessory components of many plasmids, which is an overlooked phenomenon that possibly facilitates their dissemination across microbiomes.
INTRODUCTION
Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated (cas) genes encode adaptive immune systems that provide prokaryotes with sequence-specific protection against viruses, plasmids, and other mobile genetic elements (MGEs) (1). These systems consist of two main components: (i) a CRISPR array, which is a DNA memory bank composed of sequences derived from previous infections by MGEs, and (ii) cas genes that encode the protein machinery that is necessary for the three stages of immunity (adaptation, RNA biogenesis and interference) (2). Briefly, during adaptation, short sequence fragments from the genomes of invading MGEs are integrated at the CRISPR leader end as new ‘spacers’ flanked directly by repeats in the array. Biogenesis involves expression of the CRISPR array as a long transcript (pre-crRNA) and its subsequent processing into mature CRISPR RNAs (crRNAs), each corresponding to a single spacer. Finally, during interference, the mature crRNAs are coupled with one or multiple Cas proteins in search of a complementary sequence (protospacer), leading to the nuclease-dependent degradation of target nucleic acids.
CRISPR-Cas systems are broadly distributed across the genomes of about 42% of bacteria and 85% of archaea (3). Despite the aforementioned commonalities, these systems display remarkable diversity in their mechanisms of action and in the phylogeny of their components. They are divided into two major classes, six types and more than 45 subtypes on the basis of the distinct architectures and the organization of their effector modules (3,4). Previous work has focused primarily on investigating the canonical adaptive immune functions of CRISPR-Cas systems, their distributions across prokaryotic lineages, and their numerous biotechnological applications (5,6). Although much less attention has been paid to their presence and function in MGEs, recent research demonstrates that CRISPR-Cas loci are encoded by different types of MGEs (7). Several viruses, transposons, and plasmids have been shown to carry CRISPR-Cas components that perform different roles, including participating in inter-MGE warfare (4,8–12), RNA-guided DNA transposition (13–15) and in anti-defense functions (7,16).
Plasmids are extrachromosomal, self-replicating MGEs that are ubiquitous across microbiomes on Earth. They are known to shape the ecology and evolution of microbial communities by, for example, promoting horizontal gene transfer (HGT) between taxa (17,18). Although the fates of plasmids are linked to those of their microbial hosts, plasmids and host chromosomes are subject to distinct selective constraints and follow different evolutionary trajectories (19,20). Despite the beneficial traits that some plasmids provide to their hosts under certain conditions (e.g. antibiotic or heavy metal resistance), they can also impose a physiological burden. Thus, plasmid-host relationships are often dynamic and, depending on the ecological context, extend from parasitic to mutualistic (19). Epitomizing the existence of plasmid-host conflicts, a fraction of chromosomal CRISPR spacers typically match plasmids (21–23). Furthermore, several studies have reported experimental evidence for strong CRISPR-based anti-plasmid immunity (24–26). In turn, many plasmids carry Anti-CRISPR proteins that block host CRISPR-Cas targeting (27,28).
Even though some plasmids have been reported to encode CRISPR-Cas loci (7,29–36) their incidence, diversity, distribution and function(s) remain largely unstudied. Type IV CRISPR-Cas systems, in particular, are found almost exclusively on plasmids (3,7,37) and recent work indicates that they participate in plasmid–plasmid competition dynamics (4,11). Furthermore, a study analyzing CRISPR-Cas systems across a large subset of prokaryotic genomes identified several plasmid-encoded CRISPR-Cas loci, whereas very few were encoded by associated (pro)phages (34). Here, we undertook the first systematic investigation of CRISPR-Cas contents across publicly available bacterial and archaeal plasmid data. We focused on analysing their prevalence, distribution and diversity, and investigated their CRISPR array spacer contents to infer their biological functions.
MATERIALS AND METHODS
Software and code availability
Scripts for downloading data and reproducing all analyses are available at https://github.com/Russel88/CRISPRCas_on_Plasmids. Analyses were made with a combination of shell, python 3, and R 3.6.3 scripting. Plots were made with ggplot2, heatmaps with pheatmap, phylogenetic trees with iTOL (38), and networks with gephi (39).
Dataset construction
A total of 27 939 complete bacterial plasmid sequences were downloaded from PLSDB 2020_11_19 (https://ccb-microbe.cs.uni-saarland.de/plsdb) (40), together with their associated metadata (40). A total of 253 manually curated archaeal plasmids were downloaded from NCBI RefSeq on 6 January 2020. Plasmid-host chromosome associations were determined through the NCBI assembly information, for which only sequences annotated as ‘chromosome’ were included as host sequences. Using this approach, we were able to assign a host for 21 974 of the plasmids. The number of archaeal plasmids selected is relatively low because few archaeal plasmids have been characterised and sequenced. We used GTDBtk v1.4.1 (41) to re-annotate the taxonomy of the host of each plasmid in a common phylogenomic framework. To filter out redundant plasmids, they were de-replicated using dRep version 3.1.0 (42) with the following parameters: 90% ANI cut-off for primary clustering, 95% ANI cut-off for secondary clustering and a total coverage of 90%, with fastANI (43) as secondary clustering algorithm. Size was the only criterion used to choose the plasmid to include in each cluster, such that the largest plasmid (or random among these given ties) was picked among the clustered plasmids. Dereplication resulted in a total of 17 828 plasmids, out of which 13 265 could be associated with known prokaryotic hosts.
Identification of CRISPR loci
Detection of CRISPR arrays was carried out by using CRISPRCasFinder 4.2.17 (44), coupled to an optimized algorithm for false-positive array removal (Supplementary Figure S1) and an additional analysis for finding CRISPR loci that are commonly missed by this algorithm. Briefly, high confidence arrays predicted by CRISPRCasFinder (evidence level 4) were automatically kept. The remaining arrays were binned into a ‘quarantine list’ if they were found to clear a series of conservative manually-curated parameter cutoffs: (i) calculated average CRISPR repeat conservation across the array >70%, (ii) spacer conservation <50%, (iii) standard error of the mean of the array's spacer lengths <3 and (iv) array does not overlap with an open reading frame (ORF) with a prediction confidence of at least 90% (45). Putative arrays from the quarantined list were rescued for further analysis if they were found within 1 kb to a predicted cas gene or matched (95% coverage and 95% identity) with any previously defined high confidence CRISPR repeat: CRISPRCasFinder evidence level 4 or archived in CRISPRCasdb (46). This upgrade reduced the rate of detection of false positive CRISPRs, most of which constitute short repetitive genomic regions that are erroneously selected by CRISPRCasFinder (47), and which are more common on plasmids (e.g. iterons and tandem transposon-associated repeats) (48–50). High confidence CRISPR repeats (see above) were then BLASTed (task: blastn-short, 95% coverage and 95% identity) to a database in which the CRISPR loci that were already detected were masked and any matches within 100 bp were clustered into arrays. Arrays with less than three repeats were excluded from all analyses.
Identification and typing of cas loci
The prediction and classification (at the subtype or variant level) of cas operons was carried out by CRISPRCasTyper 1.2.4 (https://github.com/Russel88/CRISPRCasTyper) (51). CRISPR arrays closer than 10 kb to the nearest cas operon were considered to be linked; the 10 kb cutoff was based on an analysis of the distribution of distances of CRISPR arrays to the closest cas operon (Supplementary Figure S2). Furthermore, we used CRISPR-repeat similarity information to type arrays that were not found linked to cas operons. These distant arrays (>10 kb from the nearest cas operon) were considered associated with a cas operon if the direct repeat sequence was at least 85% identical to the direct repeat sequence of an array adjacent to that cas operon (Supplementary Figure S3). When possible, CRISPR-Cas systems annotated as ‘Ambiguous’ were manually subtyped. The identified CRISPR-Cas loci on plasmids, plasmid-associated host chromosomes and related information are found in Supplementary Datasheet S1.
Indicator analysis
Enrichment of certain CRISPR-Cas subtypes on either plasmids or host chromosomes was investigated with an indicator species analysis, using the indicspecies R package. For the comparison between all plasmids and chromosomes the IndVal.g statistic was used, which controls for difference in group sizes. For the direct comparison between plasmids and hosts chromosomes, where both carry CRISPR-Cas, the IndVal statistic was used. Statistical significance was determined by permutation (n = 9999) and a Bonferroni adjusted P-value threshold of 0.05 was used.
Plasmid conjugative transfer and incompatibility group prediction
The conjugative transfer functions and incompatibility (Inc) typing of all plasmids in PLSDB was predicted with MOB-suite v3.0.1 using mob_typer function (52) using default parameters.
Spacer-protospacer match analysis
The genomic regions where CRISPR arrays were identified on plasmids (including CRISPR arrays with two repeats, which were otherwise excluded from the analyses) were masked in order to avoid false positive matches to spacers in arrays. Furthermore, for matches to plasmids only matches to high confidence ORFs were included, also to rule out any matches to possibly undetected CRISPR arrays. Spacers from orphan arrays whose consensus repeat could not be typed by repeatTyper from CRISPRCasTyper (https://typer.crispr.dk, model version 2021_03 (51)) were excluded from the spacer analysis to avoid any bias stemming from possible false positive arrays in this group.
Viral genomes were obtained from the IMG/VR v3 (2020-10-12_5.1, (53)) only including those annotated as ‘Reference’, which includes 39 296 viral genomes. Spacer sequences from plasmids and plasmid-associated host chromosomes were aligned against the masked dereplicated plasmid database and the virus database using FASTA 36.3.8e (54). Alignments were filtered using an e-value cutoff of 0.05. To reduce redundancy bias, spacers were only counted once, no matter the absolute number of matches.
Networks were visualized in gephi with layout generated by a combination of OpenOrd and Noverlap algorithms. For calculating taxonomic confinement of spacer-protospacer matches between plasmids, each pair of plasmids connected by at least one spacer-protospacer match was counted as one matching pair. Cross-targeting plasmids were included as two separate plasmid pairs. Confinement was calculated as the number of matches found exclusively within a specific taxonomic rank, such that each plasmid-plasmid pair was only counted once. For estimating confinement of random spacer-protospacer matching, the taxonomic annotations were permuted among the plasmid-plasmid pairs with observed spacer-protospacer matches. This was repeated 100 times and the median number of matches was used as an estimate of confinement for hypothetically random matches. For estimating targeting bias towards conjugative versus non-conjugative plasmids each unique spacer was counted with a weight of 1 with the targeting bias proportional to the number of matches to conjugative and non-conjugative plasmids, respectively. For example, a spacer matching four conjugative plasmids and one non-conjugative plasmids is counted as 0.8 for conjugative matches and 0.2 for non-conjugative matches. The spacer-protospacer matches identified for plasmid and associated host chromosome-derived CRISPR array contents are found in Supplementary Datasheet S2.
RESULTS
CRISPR-Cas systems are common on plasmids
We scanned the largest curated collection of complete wildtype bacterial (27 939) and archaeal (253) plasmid genomes in search of CRISPR and cas loci. To reduce the confounding effect of sequencing biases, we removed identical or highly similar plasmids from further analyses. This resulted in a non-redundant dataset of 17 608 bacterial and 220 archaeal plasmid sequences, spanning 30 phyla and 771 genera. For a total of 13 265 non-redundant plasmids, we were able to collect the corresponding set of host chromosome sequences (n = 6979). Overall, our survey identified a total of 338 complete and 313 putatively incomplete loci (207 orphan CRISPR arrays and 106 orphan cas), indicating that ∼3% of sequenced plasmids naturally carry one or more CRISPR and/or cas loci (Figure 1A, top). This contrasts with the much higher incidence we found on the plasmid-associated host chromosomes, which amounted to 42.3% (42% in bacteria and 63% in archaea). However, since chromosomes are substantially larger than plasmids, we corrected their incidence to genome sequence length (per Mbp) (55). Strikingly, we found that CRISPR-Cas components are on average more prevalent across plasmid sequences (Figure 1A, bottom), suggesting a selective advantage for many plasmids to carry these systems.
Whereas most detected loci represent complete CRISPR-Cas systems, solitary (orphan) CRISPR arrays and cas operons were also commonly identified. These putatively incomplete systems are more frequent on plasmids than chromosomes (Figure 1A). Intriguingly, the average lengths of orphan arrays are significantly smaller than cas-associated CRISPRs (on average 39% shorter, P < 2e–16, negative-binomial generalized linear model; Supplementary Figure S4A), which may reflect the importance of neighboring adaptation modules (cas1-2) for array expansion and maintenance. Furthermore, we found a less frequent association of plasmid-encoded systems with adaptation modules in plasmids compared to chromosomes (36% vs. 88%, respectively; Supplementary Figure S5), yet no significant difference in the array sizes of cas-associated CRISPRs. Although the reasons for the lack of adaptation modules are poorly understood, it is a characteristic feature of many other MGE-encoded CRISPR-Cas systems (e.g. carried by phages and transposons) that is thought to be compensated via in trans use of chromosomally-encoded adaptation machinery (4,7,10,13). Finally, we observed that host chromosomes tend to carry more CRISPR arrays than plasmids; 68% of chromosomes encoding CRISPR have more than 1 array, in contrast to 36% of plasmids (Supplementary Figure S4B). Together, our results underscore a pervasive acquisition of CRISPR-Cas components by plasmids and considerable differences in the composition of plasmid- and chromosome-encoded systems.
Plasmid CRISPR-Cas subtype diversity is rich and distinct from chromosomes
We then sought to investigate the diversity of CRISPR-Cas systems across plasmid genomes. Our analysis revealed a broad range of plasmid-encoded subtypes and marked differences in their abundances (Figure 1B). Except for type VI, representatives of all CRISPR-Cas types were identified in plasmids. Overall, Class 1 systems dominate the plasmid landscape (e.g. subtypes I-E, I-B, III-B and IV-A3), whereas Class 2 systems are poorly represented, with the notable exception of subtype V-F.
Next, we explored whether the subtype distributions on plasmids differed from those found across plasmid-associated host chromosomes. Inspection of the distribution and prevalence of CRISPR-Cas subtypes on chromosomes revealed notable differences (Supplementary Figures S6 and S7). An indicator analysis (see Materials and Methods for details) showed that IV-A3, V-F, IV-B, III-B and IV-A1 are significantly enriched subtypes for plasmid genomes when comparing all plasmids and their associated host chromosomes. A direct comparison, including only plasmid-chromosome pairs where both have CRISPR-Cas components, showed that IV-A3 is enriched on plasmids and I-D, V-J and I-F are relatively more prevalent on chromosomes (Supplementary Figure S7). Furthermore, our analyses revealed that the higher abundance of orphan cas loci on plasmids (Figure 1A) is largely driven by the type IV-B systems which, consistent with previous reports (4,7), are primarily encoded on plasmids and lack CRISPR arrays (Figure 1B and Supplementary Figure S7). Although relatively infrequent, we found that some individual plasmids carry multiple CRISPR-Cas systems (44 out of 385 cas-containing loci) (Supplementary Figure S8). Among these, combinations involving type I were most common, primarily paired with type III, IV and V, which may reflect functional compatibility between systems and, possibly, synergistic effects (56,57).
We next examined the diversity of CRISPR-Cas systems on plasmids across taxa to determine the possible influence of host phylogeny on their prevalence and subtype distributions. In agreement with previous surveys across prokaryotic genomes (3,58), our analysis revealed that the abundance of CRISPR-Cas is highly variable across host taxonomy (Figure 1C and Supplementary Figure S9). For instance, while CRISPR-Cas incidence on plasmids from Rhodothermia, Deinococci and Clostridia lies between 19 and 27%, in other taxa their incidence is very low or even zero. Strikingly, the prevalence and diversity of CRISPR-Cas subtypes on plasmids correlates poorly with their abundance across the chromosomes of plasmid-host taxa (Figure 1C), even when directly comparing the pool of plasmid-host chromosome pairs where both the plasmid and associated host chromosome carry CRISPR-Cas (Supplementary Figure S9). These results show distinct CRISPR-Cas compositions for plasmids and their associated host chromosomes, a pattern that likely results from the genetic autonomy of plasmids.
It is noteworthy that most available sequenced plasmids are harbored by members of Gammaproteobacteria, Bacilli and Alphaproteobacteria (Figure 1C), which together represent 84% of all plasmids with a known host. It is therefore important to consider our results in light of this strong inherent database bias, which results from traditionally higher sampling and sequencing rates of cultivable and clinically relevant microbes (59,60). Consequently, given the comparatively rare occurrence of plasmid-encoded CRISPR-Cas in these dominant taxa (Figure 1C), the calculated averaged prevalence for all plasmid-encoded CRISPR-Cas systems (∼3%) is predicted to be an underestimate of their true representation across environments. Taken together, our results indicate that plasmid-encoded CRISPR-Cas loci are frequent in nature and do not simply mirror those found in their host chromosomes, thereby highlighting the influence of distinct selective pressures that promote the recruitment and retention of specific CRISPR-Cas subtypes on plasmids versus chromosomes.
Plasmids contribute to the horizontal dissemination of CRISPR-Cas
The recently proposed bacterial pan-immune model is based on the idea that defense systems are frequently lost and acquired by community members through HGT (61). Therefore, we investigated whether there is a link between plasmid conjugative transmissibility and CRISPR-Cas presence. We specifically focused on proteobacterial plasmids, since high confidence predictions for conjugative transmissibility are limited to this phylum (59,60) and because proteobacterial plasmids dominate the dataset (62% of all non-redundant plasmid genomes).
We detected an enrichment of conjugative transfer functions within plasmids carrying CRISPR-Cas components (over 47%: average of complete systems and orphan loci; Figure 2A), a higher proportion than for plasmids not encoding CRISPR or cas (∼36%; Fisher's exact test: P-value = 5.9e–05; odds-ratio = 2.23). These results support the notion that conjugative plasmids facilitate HGT of CRISPR-Cas systems in the environment and, given the remarkably broad transfer ranges of some proteobacterial plasmids (e.g. IncQ, IncP, IncH and IncN) (62–65), possibly also across distantly related taxa. Less is known about plasmid-transfer modes outside Proteobacteria and their impact on gene exchange networks (59,60). For instance, many plasmids in Gram-positive bacteria transfer via conjugation but their transfer machinery is poorly characterized, thus rendering mobility predictions based on sequence data unreliable (59,66) and highlighting that conjugative plasmids are likely underestimated in our database. Moreover, it is expected that many non-conjugative plasmids transfer horizontally through alternative mechanisms, e.g., via transformation (67), mobilization (68), transduction (69,70), and outer membrane vesicles (71). Therefore, our results underpin the idea that plasmids are major contributors to the active dissemination of CRISPR-Cas systems across microbiomes.
CRISPR-Cas systems are enriched on plasmids of larger sizes
We then sought to examine other biological characteristics of the plasmids and searched especially for common or distinctive patterns shared by CRISPR-Cas-encoding plasmids. We focused on exploring the link between plasmid genome size and the presence of CRISPR-Cas modules. In contrast to the collection of non-CRISPR-Cas-encoding plasmids—which displayed the previously reported bimodal size distribution (59,60)—plasmids carrying CRISPR-Cas components exhibited unimodal distributions, with the peak shifted towards larger genome sizes (180–250 kb on average) (Figure 2B).
Given the relatively large sizes of CRISPR-Cas systems, a bias towards larger genomes is unsurprising and possibly stems from size-related constraints associated with certain plasmid life history strategies. Larger plasmids allocate considerable portions of their genomes to transfer, stabilization and accessory modules that enhance their persistence (17). This is congruent with the observed enrichment of CRISPR-Cas systems on conjugative plasmids (Figure 2A), which are known to be relatively large and show a unimodal size distribution centered around 250 kb (59). Similar genomic streamlining dynamics appear to extend to other MGEs, including phages, where complete CRISPR-Cas systems have been reported in huge phages (>500 kb) (10) but rarely occur in the more common, smaller-sized (pro)viral genomes (7,34). In conclusion, our data show that CRISPR-Cas systems are important components of many plasmid accessory repertoires, and are more frequently associated with plasmids of larger sizes.
Highly uneven distribution of CRISPR-Cas across plasmid Incompatibility groups
Next, we examined whether CRISPR-Cas systems in plasmids have short-lived associations or whether we could identify signs of retention by specific plasmid lineages. To this end, a common plasmid classification scheme types plasmids into incompatibility (Inc) groups and is deeply rooted in plasmid eco-evolutionary dynamics, i.e. based on the observation that plasmids sharing replication or partitioning components cannot stably propagate within a given cell host lineage (72). We therefore investigated the distribution and prevalence of CRISPR-Cas-containing plasmids across the Inc-typeable fraction of non-redundant plasmids, which corresponds to 29% of all plasmids (98% of which have a host belonging to Proteobacteria) (Figure 2C).
Overall, we found that only a reduced number of Inc types (15/50) include plasmids carrying CRISPR-Cas (Figure 2C and Supplementary Figure S10). Most CRISPR-Cas-encoding plasmids are distinctively concentrated within specific Inc families (e.g. IncH), underscoring the patchy distribution of CRISPR-Cas components across plasmids. Importantly, Inc families are used to infer a degree of genetic relatedness (phylogeny) and ecological cohesiveness, thus typically grouping plasmids that exhibit comparable backbone architecture, host range breadth, propagation mechanism, etc. (59,73). Therefore, our results indicate that some CRISPR-Cas systems are acquired by specific plasmid lineages (i.e. groups of plasmids sharing similar ecological strategies, niches and a related evolutionary trajectory) and are thus maintained stably through evolutionary timescales, presumably due to their adaptive benefits.
Plasmid spacer contents reveal a robust plasmid-targeting bias
We then focused on understanding the possible function(s) of plasmid-encoded CRISPR-Cas systems. CRISPR arrays are uniquely suited to provide ecological and biological insights; the origins of many spacer sequences can be backtracked, providing valuable clues about the functions of CRISPR-Cas and their selective benefits (22,74–76). It has been considered that the primary role of chromosome-encoded CRISPR-Cas systems is to protect cells against viruses (22,58,77). This raised the question as to whether plasmid-encoded CRISPR-Cas components reinforce this function, especially given that many plasmids encode genes that enhance the fitness of their hosts against diverse environmental threats (e.g. antimicrobial resistance) (17,78).
All spacer sequences (n = 11 080) were extracted from the bacterial and archaeal plasmid-encoded CRISPR arrays and searched against comprehensive virus and plasmid sequence datasets (Materials and Methods). For comparison, analogous searches were performed with the collection of spacers originating from: (i) the host chromosomes associated with the plasmids in this study (a total of 96 870 spacers) and (ii) plasmid-host chromosome pairs where both the plasmid and associated host chromosome carry at least one CRISPR array (4816 plasmid spacers and 10 315 chromosomal spacers). Only a limited fraction of spacers yielded significant matches to protospacer sequences (plasmids: 11.1%; hosts: 12.9%), consistent with previous studies (4,21,22,75,79–81). This is ascribed to a combination of factors, including the paucity of mobilome sequences across public databases and the high mutation rates of MGE protospacers, presumably to escape CRISPR-Cas targeting (21,22,82).
Subsequently, we examined the origins of these protospacer targets. Strikingly, a larger fraction of plasmid spacers matched sequences from other plasmids (66%), while a substantially smaller fraction matched viruses (27%) (Figure 3A). In contrast, the spacer contents originating from plasmid-host chromosomes revealed the opposite trend: a larger proportion of spacers matched viral sequences compared to plasmids (62% and 24%, respectively) (Figure 3B; Supplementary Figure S11)– consistent with a primary antiviral role of chromosomal CRISPR-Cas systems (12,22,58,76,77,83). Importantly, a more direct examination of plasmid-host chromosome pairs (limited to comparisons where both parties carry at least one CRISPR-Cas system) revealed an analogous targeting trend (Supplementary Figure S12A and B).
The abundance of plasmid spacers targeting other plasmids raised the question of whether the reported plasmid-targeting preference of type IV CRISPR-Cas systems (4) could be driving this trend, especially given the abundance of type IV spacers within our dataset (12% of plasmid spacers, yielding 48% of the spacers with any match) (Supplementary Figure S13A). However, we found that the plasmid-targeting bias also held true for the majority of other plasmid-encoded CRISPR-Cas subtypes/variants (Figure 3C and Supplementary Figure S12C). In contrast, chromosomal spacers maintained a virus-targeting preference, regardless of CRISPR-Cas subtype (Figure 3C and Supplementary Figure S12C). Furthermore, we found that the plasmid-to-plasmid versus chromosome-to-virus targeting patterns are maintained across the different taxa, implying the existence of a robust biological underpinning of this phenomenon (Figure 3D and Supplementary Figure S12D). Nevertheless, the plasmid-encoded CRISPRs from certain underrepresented taxa (Figure 3D and Supplementary Figure S12D) appear to be enriched with virus-targeting spacers (e.g. Rhodothernia and Cyanobacteria), suggesting that enhancement of antiviral host immunity could still be an important evolved strategy for some groups of plasmids.
A reticulated web of CRISPR-based plasmid-plasmid targeting
The identification of extensive plasmid-plasmid targeting provides a practical framework for investigating plasmid eco-evolutionary dynamics and offers a unique opportunity to gain insights into HGT routes. This prompted us to build a global network of plasmid-plasmid interactions based on the linkage information provided by the CRISPR-targeting data (Figure 4A). The corresponding directed graph consists of de-replicated plasmid genomes (nodes), connected by the predicted spacer-protospacer matches (edges). Overall, our analyses revealed a network with a pronounced modular structure, where a reduced number of densely connected clusters accrue the majority of plasmids, and links between clusters are very sparse. A highly visible trend across the targeting network is the clustering of plasmids according to host taxonomy, with the two largest clusters consisting of plasmids from either Enterobacteriales or Bacillales. However, generalisations based on such a trend should be made with caution and viewed in the context of the historical sequencing bias towards plasmids from cultivable and/or clinical strains. For example, inferring that plasmid targeting is a distinctive phenomenon among Enterobacteriales or Bacillales plasmids could be an inaccurate assumption, since plasmids carrying CRISPR-Cas are relatively rare in these taxa (Figure 1C), despite their sequences comprising the overwhelming majority of sequenced plasmids (Figure 1C). As more accurate sampling and sequence representation of plasmid diversity becomes available, a more clear understanding of plasmid-plasmid targeting will emerge.
Notably, the clustering analysis demonstrates a pronounced inverse relationship between the number of plasmid connections and the phylogenetic hierarchy of the cognate bacterial hosts. Whereas targeting between plasmids within a single species, genus and family account for the bulk of all predictions (∼42%, 28% and 28%, respectively), matches confined to higher taxa comprise less than 2% (Figure 4B). Indeed, a closer examination of the plasmid-plasmid targeting network in Gammaproteobacteria revealed abundant links between plasmids from different genera (Supplementary Figure S14). These results underscore that taxonomic boundaries represent a major hurdle for plasmid dissemination. Indeed, although some plasmids are able to transfer between distantly-related taxa, their long-term evolutionary host range is primarily constrained to a narrower group of phylogenetically related hosts (73,84). Furthermore, acquisition of spacers from plasmids sharing a similar host range is expected to be more frequent due to the conceivably higher rates of encounters within cells. From a CRISPR-targeting standpoint, spacer retention is also likely influenced by the selective advantage they can provide in plasmid-plasmid competition dynamics.
Given the self-transmissible properties of conjugative plasmids, we wondered whether their effective spread through bacterial populations could render them common targets for CRISPR-Cas compared to non-conjugative plasmids. In support of this, we observed an over-representation of plasmid spacers predicted to target conjugative plasmids (Figure 4C). This may indicate that conjugative invasion is detrimental to plasmids already established in a cell. This is consistent with previous reports of plasmid-encoded mechanisms specifically directed towards preventing the entry of conjugative plasmids (e.g. fertility inhibition strategies and entry exclusion systems) (85). Interestingly, we found that chromosomal spacers showed a relative underrepresentation of spacers targeting conjugative plasmids, suggesting that this type of plasmids may be less detrimental to bacteria, possibly owing to the fitness benefits associated with the adaptive gene cargos that they frequently carry (17,59).
DISCUSSION
The study of CRISPR-Cas biology has primarily focused on chromosomally-encoded systems and their adaptive antiviral functions in bacteria and archaea. While recent work has started to uncover the common association of CRISPR-Cas systems with diverse MGEs and the importance of this phenomenon for CRISPR-Cas ecology and evolution (7), their recruitment by plasmids has remained largely unexplored. Here, we present the first comprehensive analysis of CRISPR-Cas systems across the largest curated dataset of wildtype bacterial and archeal plasmids. We show that CRISPR-Cas components are pervasive accessory components of many plasmids and span a broad diversity of systems, including subtype representatives covering five out of the six known types. Interestingly, we found that certain plasmids carry multiple CRISPR-Cas systems (Figure 5A). The incidence of plasmid-encoded systems is highly uneven across taxa—ranging from 0 to 30%, but averaging at ∼3%—and the subtype diversity does not simply reflect the CRISPR-Cas contents found in the chromosomes of their host. Our results thus underscore the genetic independence of plasmids and the influence of distinct evolutionary pressures in the acquisition and retention of CRISPR-Cas on plasmids versus their associated host chromosomes.
Intriguingly, putatively incomplete loci were more abundant on plasmids than chromosomes, although less abundant than previously reported (34). It has been suggested that orphan CRISPR arrays and cas loci may be remnants of decaying CRISPR-Cas systems (58). Their relatively higher occurrence on plasmids could indicate that CRISPR-Cas systems erode faster on plasmids, or that orphan components are recruited and/or selectively maintained to perform important, but as yet unknown, biological functions. Orphan CRISPR arrays could, for instance, employ host Cas machinery in trans (86,87) or facilitate plasmid chromosome integration via recombination between plasmid and host-encoded CRISPRs (88,89). On the other hand, the higher proportion of orphan components may be an artefact of CRISPR-Cas prediction tools unable to detect a conceivably greater diversity of uncharted (sub)types across plasmids. Indeed, novel subtypes have recently been identified on diverse MGEs (4,7,10,13).
The observed enrichment of conjugative functions across CRISPR-Cas-encoding plasmids, together with the expected underestimation of transmissible plasmids in our database (i.e. due to unreliable bioinformatic prediction methods and unknown plasmid mobility mechanisms), suggest an active contribution of plasmids to the conspicuous dissemination of CRISPR-Cas systems across microbiomes. These results are in agreement with the proposed bacterial pan-immune concept, where defense systems are continually lost and (re)gained by bacteria through HGT mechanisms (61), and further consistent with the common observation of restriction-modification and toxin-antitoxin systems on plasmids (90,91).
Notably, we found that plasmid-encoded CRISPR arrays tend to carry a larger fraction of spacers predicted to target other plasmids, while plasmid-host chromosome-encoded systems show the commonly observed targeting bias towards viruses. This contrasting targeting preference was consistently observed across taxa and the different CRISPR-Cas subtypes, indicating that plasmids may primarily exploit CRISPR-Cas systems to target other plasmids, and thus likely play a less dominant role in host protection against viral predators (Figure 5B). These observations extend the hypothesis that the main function of plasmid-encoded type IV CRISPR-Cas systems is to eliminate plasmid competitors (4,11). Interestingly, we found a number of cases of plasmid cross-targeting pairs (26 in total, 4 de-replicated), where CRISPR-Cas-encoding plasmids are predicted to target each other upon crossing paths in a host cell (Figure 5C). We also found 29 examples of de-replicated plasmids predicted to target other plasmids within the same cell (Figure 5D), which could indicate the presence of counter-defense strategies to avoid targeting, such as plasmid-encoded anti-CRISPRs (Acrs) (27,28). Although we failed to identify known Acrs across the co-residing targeted plasmids, recent work describes an analogous co-evolutionary arms race between a conjugative island-encoded I-C CRISPR-Cas system and diverse MGE-encoded Acrs in Pseudomonas aeruginosa (92).
Together, our results are consistent with previous reports of the co-option of CRISPR-Cas systems, or components thereof, by different MGEs for waging inter-MGE conflicts. For example, the ICP1 Vibrio cholerae phage encodes a I-F CRISPR-Cas system to restrict the phage satellite PLE, a MGE that parasitizes ICP1 (8,16). Additionally, some giant phages and other viruses carry either complete CRISPR-Cas systems or ‘mini-arrays’ that might contribute to inter-viral conflicts (7,9,10,93). Our findings thus support the ‘guns for hire’ concept (94), whereby CRISPR-Cas systems are continually repurposed by different genetic entities. Because similar entities are expected to compete more strongly due to niche overlap (e.g. space and resources), it is not surprising to observe CRISPR-Cas driven inter-viral and inter-plasmid conflicts. Moreover, the higher proportions of virus-derived chromosomal spacers found here, and earlier, illustrate how viruses exert a stronger selection on hosts than plasmids do. Indeed, while viruses often kill their host cell, plasmids tend to only affect fitness - and can be beneficial under certain conditions. Together, these results suggest that retention of CRISPR spacer content is primarily shaped by the selective advantage single spacers confer on the genetic entities carrying them and to a lesser extent by any possible biases inherent to the spacer acquisition and targeting mechanisms.
More broadly, the implications of our findings have practical applications beyond CRISPR-Cas biology. Plasmid sequences may hide an uncharted diversity of CRISPR-Cas systems with promising biotechnological applications, e.g. in genome engineering. Furthermore, plasmid-derived CRISPRs can be exploited to determine information about a plasmid's direct relationships with other elements across evolutionary timescales. When available, spacer-protospacer match prediction data could comprise an added layer of information during retrospective plasmid host-range inference analyses, similar to how chromosomal CRISPR contents are leveraged for bioinformatic deconvolution of virus-host associations (81,95–98). Furthermore, the distinctive spacer acquisition bias at the leader end of most CRISPR arrays (82,99) suggests a promising resource for extracting chronological information about plasmid dissemination routes. Such analyses may become particularly valuable in the study of clinically relevant plasmids (e.g. those carrying antibiotic resistance or virulence determinants), for which plasmid typing and epidemiological tracking are crucial but currently difficult to infer through sequence analyses alone (64,73,100,101).
Overall, CRISPR-Cas systems constitute powerful barriers against MGE-mediated HGT in microbial communities. While the investigation of CRISPR-Cas biology has focused on chromosomally-encoded systems, our work uncovers their pervasive association with plasmids across a broad phylogenetic breadth, where they appear to play a major role in mediating plasmid-plasmid conflicts. We anticipate that MGE–MGE warfare likely constitutes an important, yet largely overlooked, factor influencing the dynamics of gene flow across microbiomes.
DATA AVAILABILITY
Scripts for downloading data and reproducing all analyses are available at https://github.com/Russel88/CRISPRCas_on_Plasmids.
Supplementary Material
ACKNOWLEDGEMENTS
We are thankful to Marina Pinilla for her contributions in the creative design of figures. We thank members of our laboratories for helpful discussions and mimosas. We are especially thankful to Sarah Camara, Fabienne Benz, Mario R. Mestre, Leah M. Smith and Nils Birkholz for valuable input and helpful discussions.
Authorship contributions: Conceptualization: R.P.-R., D.M.-M. and J.R. Direction and planning: R.P.-R. Investigation and computational analyses: J.R., D.M.-M., R.P-R. and S.A.S. Writing: R.P.-R., with support from all authors.
Contributor Information
Rafael Pinilla-Redondo, Section of Microbiology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark; Department of Technological Educations, University College Copenhagen, Sigurdsgade 26, 2200 Copenhagen, Denmark.
Jakob Russel, Section of Microbiology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark.
David Mayo-Muñoz, Section of Microbiology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark; Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand.
Shiraz A Shah, Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Ledreborg Alle 34, 2820 Gentofte, Denmark.
Roger A Garrett, Danish Archaea Centre, Department of Biology, University of Copenhagen, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark.
Joseph Nesme, Section of Microbiology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark.
Jonas S Madsen, Section of Microbiology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark.
Peter C Fineran, Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand; Bio-Protection Research Centre, University of Otago, Dunedin, New Zealand.
Søren J Sørensen, Section of Microbiology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
R.P.-R. was financed by the Independent Research Fund Denmark, InTrans Project [8022-00322B]; Lundbeck Foundation (Lundbeckfonden), postdoc grant [R347-2020-2346]; J.R. was supported by the Novo Nordisk Foundation; D.M.-M. was supported by a University of Otago Doctoral Scholarship; P.C.F. was supported by the Bio-Protection Research Centre (Tertiary Education Commission). Funding for open access charge: Section of Microbiology, University of Copenhagen.
Conflict of interest statement. None declared.
REFERENCES
- 1. Barrangou R., Marraffini L.A.. CRISPR-Cas systems: prokaryotes upgrade to adaptive immunity. Mol. Cell. 2014; 54:234–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Hille F., Richter H., Wong S.P., Bratovič M., Ressel S., Charpentier E.. The biology of CRISPR-Cas: backward and forward. Cell. 2018; 172:1239–1259. [DOI] [PubMed] [Google Scholar]
- 3. Makarova K.S., Wolf Y.I., Iranzo J., Shmakov S.A., Alkhnbashi O.S., Brouns S.J.J., Charpentier E., Cheng D., Haft D.H., Horvath P.et al.. Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 2020; 18:67–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Pinilla-Redondo R., Mayo-Muñoz D., Russel J., Garrett R.A., Randau L., Sørensen S.J., Shah S.A.. Type IV CRISPR-Cas systems are highly diverse and involved in competition between plasmids. Nucleic Acids Res. 2019; 48:2000–2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Pickar-Oliver A., Gersbach C.A.. The next generation of CRISPR–Cas technologies and applications. Nat. Rev. Mol. Cell Biol. 2019; 20:490–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Barrangou R., Doudna J.A.. Applications of CRISPR technologies in research and beyond. Nat. Biotechnol. 2016; 34:933–941. [DOI] [PubMed] [Google Scholar]
- 7. Faure G., Shmakov S.A., Yan W.X., Cheng D.R., Scott D.A., Peters J.E., Makarova K.S., Koonin E.V.. CRISPR–Cas in mobile genetic elements: counter-defence and beyond. Nat. Rev. Microbiol. 2019; 17:513–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. McKitterick A.C., LeGault K.N., Angermeyer A., Alam M., Seed K.D.. Competition between mobile genetic elements drives optimization of a phage-encoded CRISPR-Cas system: insights from a natural arms race. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2019; 374:20180089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Medvedeva S., Liu Y., Koonin E.V., Severinov K., Prangishvili D., Krupovic M.. Virus-borne mini-CRISPR arrays are involved in interviral conflicts. Nat. Commun. 2019; 10:5204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Al-Shayeb B., Sachdeva R., Chen L.-X., Ward F., Munk P., Devoto A., Castelle C.J., Olm M.R., Bouma-Gregson K., Amano Y.et al.. Clades of huge phages from across Earth's ecosystems. Nature. 2020; 578:425–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Crowley V.M., Catching A., Taylor H.N., Borges A.L., Metcalf J., Bondy-Denomy J., Jackson R.N.. A type IV-A CRISPR-Cas system in Pseudomonas aeruginosa mediates RNA-guided plasmid interference in vivo. CRISPR J. 2019; 2:434–440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Nasko D.J., Ferrell B.D., Moore R.M., Bhavsar J.D., Polson S.W., Wommack K.E.. CRISPR spacers indicate preferential matching of specific Virioplankton genes. MBio. 2019; 10:e02651-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Peters J.E., Makarova K.S., Shmakov S., Koonin E.V.. Recruitment of CRISPR-Cas systems by Tn7-like transposons. Proc. Natl Acad. Sci. U.S.A. 2017; 114:E7358–E7366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Klompe S.E., Vo P.L.H., Halpin-Healy T.S., Sternberg S.H.. Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration. Nature. 2019; 571:219–225. [DOI] [PubMed] [Google Scholar]
- 15. Strecker J., Ladha A., Gardner Z., Schmid-Burgk J.L., Makarova K.S., Koonin E.V., Zhang F.. RNA-guided DNA insertion with CRISPR-associated transposases. Science. 2019; 365:48–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Seed K.D., Lazinski D.W., Calderwood S.B., Camilli A.. A bacteriophage encodes its own CRISPR/Cas adaptive response to evade host innate immunity. Nature. 2013; 494:489–491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Norman A., Hansen L.H., Sørensen S.J.. Conjugative plasmids: vessels of the communal gene pool. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2009; 364:2275–2289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Harrison E., Brockhurst M.A.. Plasmid-mediated horizontal gene transfer is a coevolutionary process. Trends Microbiol. 2012; 20:262–267. [DOI] [PubMed] [Google Scholar]
- 19. MacLean R.C., San Millan A.. Microbial evolution: towards resolving the plasmid paradox. Curr. Biol. 2015; 25:R764–R767. [DOI] [PubMed] [Google Scholar]
- 20. Lili L.N., Britton N.F., Feil E.J.. The persistence of parasitic plasmids. Genetics. 2007; 177:399–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Touchon M., Rocha E.P.C.. The small, slow and specialized CRISPR and anti-CRISPR of Escherichia and Salmonella. PLoS One. 2010; 5:e11126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Shmakov S.A., Sitnik V., Makarova K.S., Wolf Y.I., Severinov K.V., Koonin E.V.. The CRISPR spacer space is dominated by sequences from species-specific mobilomes. MBio. 2017; 8:01307–01317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Westra E.R., Staals R.H.J., Gort G., Høgh S., Neumann S., de la Cruz F., Fineran P.C., Brouns S.J.J.. CRISPR-Cas systems preferentially target the leading regions of MOBF conjugative plasmids. RNA Biol. 2013; 10:749–761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Marraffini L.A., Sontheimer E.J.. CRISPR interference limits horizontal gene transfer in staphylococci by targeting DNA. Science. 2008; 322:1843–1845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Hatoum-Aslan A., Maniv I., Samai P., Marraffini L.A.. Genetic characterization of antiplasmid immunity through a type III-A CRISPR-Cas system. J. Bacteriol. 2014; 196:310–317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Garneau J.E., Dupuis M.-È., Villion M., Romero D.A., Barrangou R., Boyaval P., Fremaux C., Horvath P., Magadán A.H., Moineau S.. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature. 2010; 468:67–71. [DOI] [PubMed] [Google Scholar]
- 27. Mahendra C., Christie K.A., Osuna B.A., Pinilla-Redondo R., Kleinstiver B.P., Bondy-Denomy J.. Author correction: Broad-spectrum anti-CRISPR proteins facilitate horizontal gene transfer. Nat. Microbiol. 2020; 5:620–629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Pinilla-Redondo R., Shehreen S., Marino N.D., Fagerlund R.D., Brown C.M., Sørensen S.J., Fineran P.C., Bondy-Denomy J.. Discovery of multiple anti-CRISPRs highlights anti-defense gene clustering in mobile genetic elements. Nat. Commun. 2020; 11:5652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Millen A.M., Horvath P., Boyaval P., Romero D.A.. Mobile CRISPR/Cas-mediated bacteriophage resistance in Lactococcus lactis. PLoS One. 2012; 7:e51663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Özcan A., Pausch P., Linden A., Wulf A., Schühle K., Heider J., Urlaub H., Heimerl T., Bange G., Randau L.. Type IV CRISPR RNA processing and effector complex formation in Aromatoleum aromaticum. Nat Microbiol. 2019; 4:89–96. [DOI] [PubMed] [Google Scholar]
- 31. McDonald N.D., Regmi A., Morreale D.P., Borowski J.D., Boyd E.F.. CRISPR-Cas systems are present predominantly on mobile genetic elements in Vibrio species. BMC Genomics. 2019; 20:105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Maier L.-K., Dyall-Smith M., Marchfelder A.. The adaptive immune system of Haloferax volcanii. Life. 2015; 5:521–537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Scholz I., Lange S.J., Hein S., Hess W.R., Backofen R.. CRISPR-Cas systems in the Cyanobacterium synechocystis sp. PCC6803 exhibit distinct processing pathways involving at least two Cas6 and a Cmr2 protein. PLoS One. 2013; 8:e56470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Bernheim A., Bikard D., Touchon M., Rocha E.P.C.. Atypical organizations and epistatic interactions of CRISPRs and cas clusters in genomes and their mobile genetic elements. Nucleic Acids Res. 2020; 48:748–760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Lange S.J., Alkhnbashi O.S., Rose D., Will S., Backofen R.. CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems. Nucleic Acids Res. 2013; 41:8034–8044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Godde J.S., Bickerton A.. The repetitive DNA elements called CRISPRs and their associated genes: evidence of horizontal transfer among prokaryotes. J. Mol. Evol. 2006; 62:718–729. [DOI] [PubMed] [Google Scholar]
- 37. Faure G., Makarova K.S., Koonin E.V.. CRISPR-Cas: complex functional networks and multiple roles beyond adaptive immunity. J. Mol. Biol. 2019; 431:3–20. [DOI] [PubMed] [Google Scholar]
- 38. Letunic I., Bork P.. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021; 49:293–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Bastian M., Heymann S., Jacomy M.. Gephi: an open source software for exploring and manipulating networks. ICWSM. 2009; 3: 10.13140/2.1.1341.1520. [DOI] [Google Scholar]
- 40. Galata V., Fehlmann T., Backes C., Keller A.. PLSDB: a resource of complete bacterial plasmids. Nucleic Acids Res. 2019; 47:D195–D202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Chaumeil P.-A., Mussig A.J., Hugenholtz P., Parks D.H.. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019; 36:1925–1927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Olm M.R., Brown C.T., Brooks B., Banfield J.F.. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017; 11:2864–2868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Marçais G., Delcher A.L., Phillippy A.M., Coston R., Salzberg S.L., Zimin A.. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 2018; 14:e1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Couvin D., Bernheim A., Toffano-Nioche C., Touchon M., Michalik J., Néron B., Rocha E.P.C., Vergnaud G., Gautheret D., Pourcel C.. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 2018; 46:W246–W251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Hyatt D., Chen G.-L., Locascio P.F., Land M.L., Larimer F.W., Hauser L.J.. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010; 11:119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Pourcel C., Touchon M., Villeriot N., Vernadet J.-P., Couvin D., Toffano-Nioche C., Vergnaud G.. CRISPRCasdb a successor of CRISPRdb containing CRISPR arrays and cas genes from complete genome sequences, and tools to download and query lists of repeats and spacers. Nucleic Acids Res. 2020; 48:D535–D544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Zhang Q., Ye Y.. Not all predicted CRISPR-Cas systems are equal: isolated cas genes and classes of CRISPR like elements. BMC Bioinformatics. 2017; 18:92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Oliveira P.H., Prather K.J., Prazeres D.M.F., Monteiro G.A.. Analysis of DNA repeats in bacterial plasmids reveals the potential for recurrent instability events. Appl. Microbiol. Biotechnol. 2010; 87:2157–2167. [DOI] [PubMed] [Google Scholar]
- 49. Giraldo R., Fernández-Tresguerres M.E.. Twenty years of the pPS10 replicon: insights on the molecular mechanism for the activation of DNA replication in iteron-containing bacterial plasmids. Plasmid. 2004; 52:69–83. [DOI] [PubMed] [Google Scholar]
- 50. Chattoraj D.K. Control of plasmid DNA replication by iterons: no longer paradoxical. Mol. Microbiol. 2000; 37:467–476. [DOI] [PubMed] [Google Scholar]
- 51. Russel J., Pinilla-Redondo R., Mayo-Muñoz D., Shah S.A., Sørensen S.J.. CRISPRCasTyper: automated identification, annotation, and classification of CRISPR-Cas Loci. CRISPR J. 2020; 3:462–469. [DOI] [PubMed] [Google Scholar]
- 52. Robertson J., Nash J.H.E.. MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microb. Genomics. 2018; 8:e000206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Roux S., Páez-Espino D., Chen I.-M.A., Palaniappan K., Ratner A., Chu K., Reddy T.B.K., Nayfach S., Schulz F., Call L.et al.. IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res. 2021; 49:D764–D775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Pearson W.R. FASTA search programs. eLS. 2014; 10.1002/9780470015902.a0005255.pub2. [DOI] [Google Scholar]
- 55. Madsen J.S., Hylling O., Jacquiod S., Pécastaings S., Hansen L.H., Riber L., Vestergaard G., Sørensen S.J.. An intriguing relationship between the cyclic diguanylate signaling system and horizontal gene transfer. ISME J. 2018; 12:2330–2334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Silas S., Lucas-Elio P., Jackson S.A., Aroca-Crevillén A., Hansen L.L., Fineran P.C., Fire A.Z., Sánchez-Amat A.. Type III CRISPR-Cas systems can provide redundancy to counteract viral escape from type I systems. Elife. 2017; 6:e27601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Hoikkala V., Ravantti J., Díez-Villaseñor C., Tiirola M., Conrad R.A., McBride M.J., Moineau S., Sundberg L.-R.. Cooperation between different CRISPR-Cas types enables adaptation in an RNA-Targeting system. mBio. 2021; 12:e03338-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Makarova K.S., Wolf Y.I., Alkhnbashi O.S., Costa F., Shah S.A., Saunders S.J., Barrangou R., Brouns S.J.J., Charpentier E., Haft D.H.et al.. An updated evolutionary classification of CRISPR–Cas systems. Nat. Rev. Microbiol. 2015; 13:722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Smillie C., Garcillán-Barcia M.P., Francia M.V., Rocha E.P.C., de la Cruz F.. Mobility of plasmids. Microbiol. Mol. Biol. Rev. 2010; 74:434–452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Shintani M., Sanchez Z.K., Kimbara K.. Genomics of microbial plasmids: classification and identification based on replication and transfer systems and host taxonomy. Front. Microbiol. 2015; 6:242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Bernheim A., Sorek R.. The pan-immune system of bacteria: antiviral defence as a community resource. Nat. Rev. Microbiol. 2020; 18:113–119. [DOI] [PubMed] [Google Scholar]
- 62. Klümper U., Riber L., Dechesne A., Sannazzarro A., Hansen L.H., Sørensen S.J., Smets B.F.. Broad host range plasmids can invade an unexpectedly diverse fraction of a soil bacterial community. ISME J. 2015; 9:934–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Pinilla-Redondo R., Olsen A.K., Russel J., de Vries L.E.. Conjugative dissemination of plasmids in rapid sand filters: a trojan horse strategy to enhance pesticide degradation in groundwater treatment. 2020; bioRxiv doi:08 March 2020, preprint: not peer reviewed 10.1101/2020.03.06.980565. [DOI]
- 64. Suzuki H., Yano H., Brown C.J., Top E.M.. Predicting plasmid promiscuity based on genomic signature. J. Bacteriol. 2010; 192:6045–6055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Jain A., Srivastava P.. Broad host range plasmids. FEMS Microbiol. Lett. 2013; 348:87–96. [DOI] [PubMed] [Google Scholar]
- 66. Garcillán-Barcia M.P., Alvarado A., de la Cruz F.. Identification of bacterial plasmids based on mobility and plasmid population biology. FEMS Microbiol. Rev. 2011; 35:936–956. [DOI] [PubMed] [Google Scholar]
- 67. Lorenz M.G., Wackernagel W.. Bacterial gene transfer by natural genetic transformation in the environment. Microbiol. Rev. 1994; 58:563–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Ramsay J.P., Firth N.. Diverse mobilization strategies facilitate transfer of non-conjugative mobile genetic elements. Curr. Opin. Microbiol. 2017; 38:1–9. [DOI] [PubMed] [Google Scholar]
- 69. Ammann A., Neve H., Geis A., Heller K.J.. Plasmid transfer via transduction from Streptococcus thermophilus to Lactococcus lactis. J. Bacteriol. 2008; 190:3083–3087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Watson B.N.J., Staals R.H.J., Fineran P.C.. CRISPR-Cas-mediated phage resistance enhances horizontal gene transfer by transduction. MBio. 2018; 9:e02406-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Bitto N.J., Chapman R., Pidot S., Costin A., Lo C., Choi J., D’Cruze T., Reynolds E.C., Dashper S.G., Turnbull L.et al.. Bacterial membrane vesicles transport their DNA cargo into host cells. Sci. Rep. 2017; 7:7072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Novick R.P. Plasmid incompatibility. Microbiol. Rev. 1987; 51:381–395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Redondo-Salvo S., Fernández-López R., Ruiz R., Vielva L., de Toro M., Rocha E.P.C., Garcillán-Barcia M.P., de la Cruz F.. Pathways for horizontal gene transfer in bacteria revealed by a global map of their plasmids. Nat. Commun. 2020; 11:3602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Nicholson T.J., Jackson S.A., Croft B.I., Staals R.H.J., Fineran P.C., Brown C.M.. Bioinformatic evidence of widespread priming in type I and II CRISPR-Cas systems. RNA Biol. 2019; 16:566–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Shah S.A., Hansen N.R., Garrett R.A.. Distribution of CRISPR spacer matches in viruses and plasmids of crenarchaeal acidothermophiles and implications for their inhibitory mechanism. Biochem. Soc. Trans. 2009; 37:23–28. [DOI] [PubMed] [Google Scholar]
- 76. Paez-Espino D., Morovic W., Sun C.L., Thomas B.C., Ueda K.-I., Stahl B., Barrangou R., Banfield J.F.. Strong bias in the bacterial CRISPR elements that confer immunity to phage. Nat. Commun. 2013; 4:1430. [DOI] [PubMed] [Google Scholar]
- 77. Soto-Perez P., Bisanz J.E., Berry J.D., Lam K.N., Bondy-Denomy J., Turnbaugh P.J.. CRISPR-Cas system of a prevalent human gut Bacterium reveals hyper-targeting against phages in a human virome catalog. Cell Host Microbe. 2019; 26:325–335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Rankin D.J., Rocha E.P.C., Brown S.P.. What traits are carried on mobile genetic elements, and why?. Heredity. 2011; 106:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Stern A., Keren L., Wurtzel O., Amitai G., Sorek R.. Self-targeting by CRISPR: gene regulation or autoimmunity?. Trends Genet. 2010; 26:335–340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Mojica F.J.M., Díez-Villaseñor C., García-Martínez J., Almendros C.. Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology. 2009; 155:733–740. [DOI] [PubMed] [Google Scholar]
- 81. Shmakov S.A., Wolf Y.I., Savitskaya E., Severinov K.V., Koonin E.V.. Mapping CRISPR spaceromes reveals vast host-specific viromes of prokaryotes. Commun Biol. 2020; 3:321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Andersson A.F., Banfield J.F.. Virus population dynamics and acquired virus resistance in natural microbial communities. Science. 2008; 320:1047–1050. [DOI] [PubMed] [Google Scholar]
- 83. Paez-Espino D., Sharon I., Morovic W., Stahl B., Thomas B.C., Barrangou R., Banfield J.F.. CRISPR immunity drives rapid phage genome evolution in Streptococcus thermophilus. MBio. 2015; 6:e00262-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84. Acman M., van Dorp L., Santini J.M., Balloux F.. 2020) Large-scale network analysis captures biological features of bacterial plasmids. Nat. Commun. 11:2452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. Getino M., de la Cruz F.. Natural and artificial strategies to control the conjugative transmission of plasmids. Microbiol Spectr. 2018; 6: 10.1128/microbiolspec.MTBP-0015-2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Deecker S.R., Ensminger A.W.. Type I-F CRISPR-Cas distribution and array dynamics in Legionella pneumophila. G3 (Bethesda). 2020; 10:1039–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Almendros C., Guzmán N.M., García-Martínez J., Mojica F.J.M.. Anti-cas spacers in orphan CRISPR4 arrays prevent uptake of active CRISPR–Cas I-F systems. Nat. Microbiol. 2016; 1:16081. [DOI] [PubMed] [Google Scholar]
- 88. Varble A., Meaden S., Barrangou R., Westra E.R., Marraffini L.A.. Recombination between phages and CRISPR-cas loci facilitates horizontal gene transfer in staphylococci. Nat. Microbiol. 2019; 4:956–963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Varble A., Campisi E., Euler C.W., Fyodorova J., Rostøl J.T., Fischetti V.A., Marraffini L.A.. Integration of prophages into CRISPR loci remodels viral immunity in Streptococcus pyogenes. 2020; bioRxiv doi:09 October 2020, preprint: not peer reviewed 10.1101/2020.10.09.333658. [DOI] [PubMed]
- 90. Rego R.O.M., Bestor A., Rosa P.A.. Defining the plasmid-borne restriction-modification systems of the Lyme disease spirochete Borrelia burgdorferi. J. Bacteriol. 2011; 193:1161–1171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Van Melderen L. Toxin–antitoxin systems: why so many, what for?. Curr. Opin. Microbiol. 2010; 13:781–785. [DOI] [PubMed] [Google Scholar]
- 92. Leon L.M., Park A.E., Borges A.L., Zhang J.Y., Bondy-Denomy J.. Mobile element warfare via CRISPR and anti-CRISPR in Pseudomonas aeruginosa. Nucleic Acids Res. 2021; 4:2114–2125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Minot S., Sinha R., Chen J., Li H., Keilbaugh S.A., Wu G.D., Lewis J.D., Bushman F.D.. The human gut virome: inter-individual variation and dynamic response to diet. Genome Res. 2011; 21:1616–1625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Koonin E.V., Makarova K.S., Wolf Y.I., Krupovic M.. Evolutionary entanglement of mobile genetic elements and host defence systems: guns for hire. Nat. Rev. Genet. 2020; 21:119–131. [DOI] [PubMed] [Google Scholar]
- 95. Anderson R.E., Brazelton W.J., Baross J.A.. Using CRISPRs as a metagenomic tool to identify microbial hosts of a diffuse flow hydrothermal vent viral assemblage. FEMS Microbiol. Ecol. 2011; 77:120–133. [DOI] [PubMed] [Google Scholar]
- 96. Sanguino L., Franqueville L., Vogel T.M., Larose C.. Linking environmental prokaryotic viruses and their host through CRISPRs. FEMS Microbiol. Ecol. 2015; 91:fiv046. [DOI] [PubMed] [Google Scholar]
- 97. Hidalgo-Cantabrana C., Sanozky-Dawes R., Barrangou R.. Insights into the human virome using CRISPR spacers from microbiomes. Viruses. 2018; 10:479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98. Dion M.B., Plante P.-L., Zufferey E., Shah S.A., Corbeil J., Moineau S.. Streamlining CRISPR spacer-based bacterial host predictions to decipher the viral dark matter. Nucleic. Acids. Res. 2021; 49:3127–3138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99. Jackson S.A., McKenzie R.E., Fagerlund R.D., Kieper S.N., Fineran P.C., Brouns S.J.J.. CRISPR-Cas: adapting to change. Science. 2017; 356:e5056. [DOI] [PubMed] [Google Scholar]
- 100. Pinilla-Redondo R., Cyriaque V., Jacquiod S., Sørensen S.J., Riber L.. Monitoring plasmid-mediated horizontal gene transfer in microbiomes: recent advances and future perspectives. Plasmid. 2018; 99:56–67. [DOI] [PubMed] [Google Scholar]
- 101. Sen D., Brown C.J., Top E.M., Sullivan J.. Inferring the evolutionary history of IncP-1 plasmids despite incongruence among backbone gene trees. Mol. Biol. Evol. 2013; 30:154–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Scripts for downloading data and reproducing all analyses are available at https://github.com/Russel88/CRISPRCas_on_Plasmids.