Skip to main content
FEMS Microbiology Letters logoLink to FEMS Microbiology Letters
. 2019 Apr 17;366(7):fnz079. doi: 10.1093/femsle/fnz079

Predicted highly derived class 1 CRISPR-Cas system in Haloarchaea containing diverged Cas5 and Cas7 homologs but no CRISPR array

Kira S Makarova 1,, Svetlana Karamycheva 1, Shiraz A Shah 2, Gisle Vestergaard 3, Roger A Garrett 2, Eugene V Koonin 1
PMCID: PMC6702361  PMID: 30993331

ABSTRACT

Screening of genomic and metagenomic databases for new variants of CRISPR-Cas systems increasingly results in the discovery of derived variants that do not seem to possess the interference capacity and are implicated in functions distinct from adaptive immunity. We describe an extremely derived putative class 1 CRISPR-Cas system that is present in many Halobacteria and consists of distant homologs of the Cas5 and Cas7 protein along with an uncharacterized conserved protein and various nucleases. We hypothesize that, although this system lacks typical CRISPR effectors or a CRISPR array, it functions as a RNA-dependent defense mechanism that, unlike other derived CRISPR-Cas, utilizes alternative nucleases to cleave invader genomes.

Keywords: CRISPR-Cas systems; Cas7, Cas5; Halobacteria; defense systems; nuclease


HRAMP-derived class 1 CRISPR-Cas system not linked to CRISPR arrays.

INTRODUCTION

Bacterial and archaeal CRISPR-Cas systems show extensive diversity of gene composition and genomic loci organization. In addition to fully functional CRISPR-Cas variants, recent screenings of the rapidly growing genomic and metagenomic sequence databases identify various derived forms (Makarova, Wolf and Koonin 2018). Most of these reduced CRISPR-Cas systems lack the adaptation module, and some also lack the components of the effector module that are responsible for interference (Koonin and Makarova 2017; Makarova, Wolf and Koonin 2018). Examples include type IV systems that are typically carried by plasmids, minimalist variants of subtype I-F and I-B encoded by Tn7-like transposons, and some chromosomally encoded variants associated with genes for components of signal transduction systems (Gleditzsch et al. 2016; Pausch et al. 2017; Peters et al. 2017; Shmakov et al. 2018; Ozcan et al. 2019). Most of these derived CRISPR-Cas versions have not been studied in any detail experimentally but the lack of the components required for interference allows one to confidently predict that the functions of these variants are distinct from adaptive immunity (Peters et al. 2017). One hypothesis on the functions of the minimalist CRISPR-Cas variants encoded by mobile genetic elements is that they mediate guide RNA-dependent integration of the respective MGE into host genomes (Koonin and Makarova 2017; Peters et al. 2017).

In addition to the derived CRISPR-Cas systems, stand-alone homologs of cas genes, such as cas1, cas2 and cas4 (Krupovic et al. 2014; Hudaiberdiev et al. 2017; Koonin and Makarova 2017) were also noticed in other, non-CRISPR contexts. Homologs of Cas1 are the transposases of a distinct class of self-synthesizing transposons, the casposons that are thought to have been the evolutionary ancestors of the CRISPR adaptation module (Krupovic, Beguin and Koonin 2017) were also noticed in other, non-CRISPR contexts. A remarkable case is a putative stress response system that consists of a protein that is homologous to Cas10 and shares with it a predicted polymerase domain, and a protein that is homologous to Csx1 of type III systems and, like the latter, consists of a signaling CARF domain and a HEPN RNase domain (Burroughs et al. 2015). The current hypothesis, proposed by analogy with type III CRISPR-Cas systems, is that this is a stress-induced system for dormancy induction and/programmed cell death (Burroughs et al. 2015; Faure, Makarova and Koonin 2019). The functions of the rest of the solitary Cas proteins, however, remain obscure.

Here, we describe an extremely derived form of type I CRISPR-Cas that is conserved in many Halobacteria and consists of distant homologs of the RRM (RNA recognition motif) domain containing proteins from Cas7 and Cas5 groups along with an uncharacterized conserved protein and various nucleases but is not associated with CRISPR arrays). This system has been detected previously, but was not analyzed in details (Vestergaard, Garrett and Shah 2014) (Maier et al. 2017). We hypothesize that, despite the absence of typical Cas effectors, this system performs an RNA-dependent defense function.

METHODS

Genome analysis

The sequences of 524 complete or nearly complete archaeal genomes were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/). Sequences were assigned to the 2014 version of arCOGs (Makarova, Wolf and Koonin 2015) using PSI-BLAST (Altschul et al. 1997), with the arCOG alignments used as the position-specific scoring matrix (PSSM) sources as previously described (Makarova, Wolf and Koonin 2015).

Sequence analysis

Iterative profile searches using PSI-BLAST (Altschul et al. 1997), with the cut-off e-value of 0.01, and composition-based statistics and low complexity filtering turned off, were employed to search for distantly similar sequences in either the NR (non-redundant) database or the protein sequence database of 524 archaeal genomes. To detect distant sequence similarity, CDD-search (Marchler-Bauer et al. 2009), with cut-off e-value of 0.01 and low complexity filtering turned off, and HHpred search with default parameters were run against PDB, Pfam and CDD profile databases (Soding, Biegert and Lupas 2005). Protein secondary structure was predicted using Jpred 4 (Drozdetskiy et al. 2015). Multiple alignments of protein sequences were constructed using the MUSCLE program with default parameters (Edgar 2004). The 16S rRNA tree was constructed using FastTree with default parameters (Price, Dehal and Arkin 2010).

RESULTS

In the course of our systematic analysis of potential integrated elements (IE) in Halobacteria, we identified a region in the genome of Halogeometricum borinquense DSM 11551 that is flanked by XerC-like integrases, contains many predicted defense-related genes and thus appears to represent a typical defense island (Fig.   1A) (Makarova et al. 2011). Among these defense genes, there is one for the prokaryotic Argonaute protein associated with a PLD family nuclease that is fused to an ATPase (Willkomm, Makarova and Grohmann 2018), a RM system components, a HEPN-MNT toxin–antitoxin module (Jia et al. 2018), several nucleases and others (Fig. 1A). Along with these genes, the island contains 4 uncharacterized genes that belong to arCOG08890, arCOG06425, arCOG06424 and arCOG09185. These 4 genes appear to belong to the same operon with an HNH family nuclease (arCOG03899).

Figure 1.

Figure 1.

HRAMP: a putative-derived CRISPR-Cas system containing two RAMP genes. (A), The putative integrated island containing many defense or DNA repair genes in which the HRAMP system was initially identified. Genes are shown approximately to scale by block arrows. Red outline shows genes that were further analyzed. Defense or DNA repair genes are shown in green, XerC and other transposases are shown in black, and uncharacterized genes are shown in gray. Other genes that are not tightly linked to the defense—white. Gene names or short descriptions are provided below the respective arrows. For detailed annotation of these genes, see Table S1 (Supporting Information). (B), Five genes from the island shown in (A) that form a conserved gene neighborhood. The genes are not shown to scale. arCOG numbers are indicated below each arrow. (C), Evolution of the HRAMP system in Halobacteria. The genomic architecture of the HRAMP locus is overlaid onto a 16S rRNA subtree (extracted from the respective complete tree built for all 123 Halobacteria for which the full size 16S RNA gene sequence is available in the genome) of Halobacterial species that possess at least one of the core HRAMP genes, arCOG08890, arCOG06425, arCOG06424 (if these genes are located on a plasmid, the plasmid name is indicated after the species name). Homologous genes are shown by arrows and colored either according to Figure 1B or according to the inset on the right. The genes are not shown to scale. arCOG numbers are indicated inside the arrows.

We applied the sensitive HHpred search to analyze the sequences of these four uncharacterized proteins. For WP_006054136, a representative of arCOG08890, HHpred identifies similarity to numerous Cas7 proteins, with the best hit (probability = 96%) to Csf2 proteins of type IV systems (Fig. S1, Supporting Information). Among the other top hits, there are PDB:4N0L, Csm3 (probability = 94%); 6MUT, Csm1 (probability = 94%) and other Cas7-like proteins. All proteins of arCOG08890 contain a typical glycine-rich loop, the signature of the RAMP (Repeat-Associated Mysterious Protein) superfamily of the RNA Recognition Motif (RRM) fold proteins that includes Cas5, Cas6 and Cas7 (Koonin, Makarova and Zhang 2017; Makarova, Zhang and Koonin 2017) (Fig. S1, Supporting Information). The proteins of arCOG06424 ( WP_004594172 used as a query) showed a weaker but significant similarity to RAMP superfamily proteins. The C-terminal portion of WP_004594172 is similar to cd09700, Csx10 (85%), cd09684, Csm3 (73%) and other RAMP proteins, whereas the shorter N-terminal region is similar to cd09650, a generic profile for Cas7 of type I (probability = 37%) (Fig. S1, Supporting Information). Examination of the multiple alignment of arCOG06424 shows the presence of two conserved glycine-rich loops (Fig. S1, Supporting Information) which implies duplication of the RRM domain, a diagnostic feature of Cas5 proteins. Thus, it appears likely that arCOG06424 is an extremely diverged derivative of the Cas5 group RAMPs. Hereafter, we refer to this system as HRAMP (Halobacterial RAMP).

ArCOG06425 consists of proteins with a predicted alpha/beta fold that are highly conserved in Halobacteria but show no significant similarity to known protein families. This family therefore can be considered as a signature of the HRAMP system. Finally, proteins of arCOG09185 ( WP_006054139 used as the query) are highly similar to the PDB:5M1S, DNA editing Proofreading Exonuclease (probability = 99%). This enzyme belongs to a family of 3′-5′ exonucleases (RNase H fold) that is known as DEDDy, after a characteristic DxE-D-D motif (Fernandez-Leiro et al. 2017) (Fig. 1A and B, Supplementary Figure S1).

We next identified all occurrences of arCOG08890, arCOG06424 and arCOG06425 in microbial genomes and explored their neighborhoods (Table S1, Supporting Information). At least one of these 3 arCOGs is represented in 34 of the 135 Halobacterial genomes that are included in our current arCOG database but not in other archaea (Table S1, Supporting Information). Searching the NCBI NR database using representatives of arCOG06425 (e.g. WP_013440547) and arCOG06424 (e.g. WP_006054138) as the queries failed to detect any homologs outside Halobacteria. Searches with some of the Cas7 homologs (arCOG08890) from the HRAMP loci (e.g. WP_006090784), under a relaxed E-value inclusion threshold (0.001) led to the identification of remote homologs in bacteria and archaea after the second PSI-BLAST iteration. However, examination of the corresponding genomic neighborhoods indicates that these proteins are encoded, mostly, within typical CRISPR-Cas system loci, typically, of type III-D. Thus, the HRAMP system appears to be specific for Haloarchaea.

The species carrying the HRAMP locus are scattered in Halobacterial lineage (Table S1, Supporting Information). This patchy distribution indicates that HRAMP is not essential and is prone to horizontal gene transfer (HGT). Indeed, several HRAMP loci are located on plasmids (Fig. 1C). A total of 19 islands with the HRAMP locus also encode one or more XerC-like integrases, suggesting that many of these genes might be associated with IEs (Table S1, Supporting Information). In the identified neighborhoods, arCOG08890, arCOG06425, and arCOG06424 are typically present together and are always encoded in putative operons suggesting that the proteins encoded by these three genes comprise the core of the HRAMP system where all the components are functionally linked and might form a complex. Among the genes that are potentially co-expressed with the three core genes, there are those for the DEDDy family nuclease (arCOG09185), HNH family nuclease that is typically fused to an HTH domain (arCOG03899 and arCOG03898), Zn finger containing protein (arCOG10929), a small uncharacterized protein (arCOG08128) and transcriptional regulators from two subfamilies of the ArsR family (arCOG03924 and 8095) (Fig. 1C). All the nucleases and transcription regulators that are associated with HRAMP were also observed in different contexts in other genomes suggesting that these are ancillary components of the HRAMP system and that the three core proteins might possess some activity on their own (Table S1, Supporting Information).

DISCUSSION

Recent comparative genomics studies led to the identification of various derived CRISPR-Cas systems. Here, we describe a derivative of class 1 CRISPR-Cas systems, so far limited in its spread to Halobacteria. This system is reduced to the bare minimum of essential components, namely, only two Cas proteins, highly diverged homologs of Cas7 and Cas5 (thus, we denote this system HRAMP, after two Halobacterial RAMPs), and no CRISPR array or adaptation genes. By analogy with the functional type I systems, it appears likely that Cas7 homologs (arCOG08890) form the backbone of the effector complex, whereas Cas5 homologs (arCOG06424) bind the 5’ handle of a guide RNA (Makarova, Zhang and Koonin 2017). The role of the third core component of HRAMP, arCOG06425, is unclear, especially as it has been shown that the core effector complex can be assembled from Cas5 and Cas7 subunits only, including the I-B effector complex in Halobacteria (Maier et al. 2018). Considering the remarkable sequence conservation of arCOG06424 proteins among Halobacteria and the presence of several invariant amino acids typically involved in catalysis (Fig. S1, Supporting Information), these proteins might be involved in the processing of a precursor RNA. Alternatively, it cannot be ruled out, by analogy with some type III systems (Pyenson and Marraffini 2017), that the Cas7 homolog itself is responsible for precursor RNA cleavage especially considering that the presence of an invariant histidine in the arCOG08890 alignment (Fig. S1, Supporting Information) whereas the arCOG06424 could be an effector nuclease.

The HRAMP system is present in 34 haloarchaeal genomes in our dataset, but only 14 of those additionally encode either I-B or I-D CRISPR-Cas system. In the other 20 genomes, HRAMP is the only detectable CRISPR-Cas system (Table S1, Supporting Information). This observation strongly suggest that HRAMP operates with a distinct, still unidentified, non-CRISPR RNA, or possibly, ssDNA. Given the tight link between the HRAMP core genes and the genes coding for two distinct DNases (HNH and DEDDy), it appears most likely that HRAMP targets dsDNA.

There is a notable analogy between HRAMP and type IV CRISPR-Cas systems that is also a degraded derivative of class 1 CRISPR-Cas system. Moreover, both HRAMP and type IV systems are often associated with plasmids and integrated elements, suggesting that they might be involved in uncharacterized roles in the reproduction or maintenance of these mobile elements. A recent analysis of the type IV spacers strongly suggests that at least some type IV systems are involved in inter-plasmid competition (Newire et al. 2019). There is, however, an apparent major difference between HRAMP and type IV systems in that the former, unlike the latter, are tightly associated with nucleases and thus are strongly predicted to target DNA for degradation.

The discovery of HRAMP shows that the evolutionary and functional plasticity of CRISPR-Cas could be even greater than currently appreciated. In order to better understand the functional limits of CRISPR-Cas, it will be important to elucidate the molecular mechanisms of HRAMP. Should these turn out to be analogous to the typical CRISPR-Cas, with the HNH and DEDDy nucleases (and possibly, arCOG06425) replacing the known CRISPR effector nucleases, HRAMP would be a candidate for Type VII. However, given the extremely derived state of HRAMP, it is currently impossible to exclude that this systems functions on principles fundamentally different from those of CRISPR-Cas.

Supplementary Material

Supplemental Files

ACKNOWLEDGEMENTS

We thank Dr. Anita Marchfelder for helpful discussions. KSM and EVK are supported by intramural funds of the US Department of Health and Human Services (to National Library of Medicine). SAS is supported by the Capital Region of Denmark (grant no. A6291).

Conflict of interest

The authors declare that there are no conflicts of interest.

REFERENCES

  1. Altschul SF, Madden TL, Schaffer AAet al.. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Burroughs AM, Zhang D, Schaffer DEet al.. Comparative genomic analyses reveal a vast, novel network of nucleotide-centric systems in biological conflicts, immunity and signaling. Nucleic Acids Res. 2015;43:10633–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Drozdetskiy A, Cole C, Procter Jet al.. JPred4: a protein secondary structure prediction server. Nucleic Acids Res. 2015;43:389-94 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Faure G, Makarova KS, Koonin EV. CRISPR-Cas: complex functional networks and multiple roles beyond adaptive immunity. J Mol Biol. 2019;431:3–20. [DOI] [PubMed] [Google Scholar]
  6. Fernandez-Leiro R, Conrad J, Yang JCet al.. Self-correcting mismatches during high-fidelity DNA replication. Nat Struct Mol Biol. 2017;24:140–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gleditzsch D, Muller-Esparza H, Pausch Pet al.. Modulating the Cascade architecture of a minimal Type I-F CRISPR-Cas system. Nucleic Acids Res. 2016;44:5872–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hudaiberdiev S, Shmakov S, Wolf YIet al.. Phylogenomics of Cas4 family nucleases. BMC Evol Biol. 2017;17:232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Jia X, Yao J, Gao Zet al.. Structure-function analyses reveal the molecular architecture and neutralization mechanism of a bacterial HEPN-MNT toxin-antitoxin system. J Biol Chem. 2018;293:6812–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Koonin EV, Makarova KS. Mobile genetic elements and evolution of CRISPR-Cas systems: All the way there and back. Genome Biol Evol. 2017;9:2812–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Koonin EV, Makarova KS, Zhang F. Diversity, classification and evolution of CRISPR-Cas systems. Curr Opin Microbiol. 2017;37:67–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Krupovic M, Beguin P, Koonin EV. Casposons: mobile genetic elements that gave rise to the CRISPR-Cas adaptation machinery. Curr Opin Microbiol. 2017;38:36–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Krupovic M, Makarova KS, Forterre Pet al.. Casposons: a new superfamily of self-synthesizing DNA transposons at the origin of prokaryotic CRISPR-Cas immunity. BMC Biol. 2014;12:36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Maier LK, Alkhnbashi OS, Backofen Ret al.. CRISPR and Salty: CRISPR-Cas systems in Haloarchaea. In: Clouet-dOrval B. (eds) RNA Metabolism and Gene Expression in Archaea. Springer, Cham, 2017, 243–69. [Google Scholar]
  15. Maier LK, Stachler AE, Brendel Jet al.. The nuts and bolts of the Haloferax CRISPR-Cas system I-B. RNA Biol. 2018;21:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Makarova KS, Wolf YI, Koonin EV. Archaeal clusters of orthologous genes (arCOGs): an update and application for analysis of shared features between thermococcales, methanococcales, and methanobacteriales. Life (Basel). 2015;5:818–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Makarova KS, Zhang F, Koonin EV. SnapShot: class 1 CRISPR-Cas systems. Cell. 2017;168:946–946 e941. [DOI] [PubMed] [Google Scholar]
  18. Makarova KS, Wolf YI, Koonin EV. Classification and nomenclature of CRISPR-Cas systems: where from here?. CRISPR J. 2018;1:325–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Makarova KS, Wolf YI, Snir Set al.. Defense islands in bacterial and archaeal genomes and prediction of novel defense systems. J Bacteriol. 2011;193:6039–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Marchler-Bauer A, Anderson JB, Chitsaz Fet al.. CDD: specific functional annotation with the conserved domain database. Nucleic Acids Res. 2009;37:D205–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Newire E, Aydin A, Juma Set al.. Identification of a Type IV CRISPR-Cas system located exclusively on IncHI1B/ IncFIB plasmids in Enterobacteriaceae. bioRxiv. 2019. doi: https://doi.org/10.1101/536375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Ozcan A, Pausch P, Linden Aet al.. Type IV CRISPR RNA processing and effector complex formation in Aromatoleum aromaticum. Nature Microbiol. 2019;4:89–96. [DOI] [PubMed] [Google Scholar]
  23. Pausch P, Muller-Esparza H, Gleditzsch Det al.. Structural variation of type I-F CRISPR RNA guided DNA surveillance. Mol Cell. 2017;67:622–632 e624. [DOI] [PubMed] [Google Scholar]
  24. Peters JE, Makarova KS, Shmakov Set al.. Recruitment of CRISPR-Cas systems by Tn7-like transposons. Proc Natl Acad Sci w. 2017;114:E7358–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Pyenson NC, Marraffini LA. Type III CRISPR-Cas systems: when DNA cleavage just isn't enough. Curr Opin Microbiol. 2017;37:150–4. [DOI] [PubMed] [Google Scholar]
  27. Shmakov SA, Makarova KS, Wolf YIet al.. Systematic prediction of genes functionally linked to CRISPR-Cas systems by gene neighborhood analysis. Proc Natl Acad Sci USA. 2018;115:E5307–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Soding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005;33:W244–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Vestergaard G, Garrett RA, Shah SA. CRISPR adaptive immune systems of Archaea. RNA Biol. 2014;11:156–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Willkomm S, Makarova KS, Grohmann D. DNA silencing by prokaryotic Argonaute proteins adds a new layer of defense against invading nucleic acids. FEMS Microbiol Rev. 2018;42:376–87. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Files

Articles from FEMS Microbiology Letters are provided here courtesy of Oxford University Press

RESOURCES