Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2026 Mar 14;54(5):gkag225. doi: 10.1093/nar/gkag225

Diversity and evolution of archaeal immune strategies

Laura Martínez-Alvarez 1,, Xu Peng 2,
PMCID: PMC12988326  PMID: 41830331

Abstract

Archaeal antiviral defense systems remain poorly characterized despite recent advances in understanding prokaryotic immunity. Here, we analyze 7747 archaeal genomes, the largest and most diverse dataset to date, revealing a striking disparity in defense system prevalence and diversity compared to Bacteria. Nearly one-third of archaeal genomes have no detected systems beyond CRISPR-Cas and restriction-modification (in contrast to only 2.2% bacterial genomes), and only 50–55% contain CRISPR-Cas systems, far below previous estimates. Many known defense systems appear restricted to Bacteria, while several single-gene putative candidate systems (PDCs) recently identified through a guilt-by-embedding approach are enriched in Archaea. Phylogenetic analyses suggest that PDC-S70 and PDC-M05 likely originated in Archaea, representing rare archaeal contributions to the prokaryotic immune repertoire. Consistent with earlier studies, our findings support the existence of deep evolutionary links between archaeal and eukaryotic systems for argonautes and viperins. These analyses highlight both the underexplored nature and the evolutionary significance of archaeal immunity, calling for expanded efforts to uncover archaeal-specific systems and improve our understanding of immune evolution across domains of life.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

Mobile genetic elements (MGEs) are major drivers of horizontal gene transfer (HGT). They contribute to microbial genetic diversity and evolutionary innovation by facilitating the spread of adaptive traits such as antibiotic resistance and virulence factors through mechanisms such as transformation, transduction, and conjugation [1, 2]. Their interactions with hosts range from mutualistic to parasitic, often imposing significant fitness costs [3, 4].

To counteract the costs imposed by MGEs, prokaryotes have evolved diverse defense strategies, from receptor modifications to sophisticated defense systems that degrade or modify nucleic acids, arrest cell growth, or disrupt membranes [5, 7]. Defense systems often cluster in genomic “defense islands”, frequently alongside MGEs [8]. These are regions prone to HGT, an arrangement that may facilitate their synergy and co-mobilization [9, 10]. The reciprocal selective pressure between MGEs and host defenses drives an ongoing evolutionary arms race, resulting in rapid innovation of prokaryotic immunity [4].

Over 300 antiviral defense systems have been identified to date, many through recent advances in computational approaches [1115]. Although more than one-third of known systems also appear in Archaea [6], studies of archaeal immunity have remained limited in scope. Until recently, analyses relied primarily on RefSeq genomes, where archaea comprised fewer than 2% of the data [16, 17]. This limitation has led to assumptions that archaeal immune landscapes largely mirror those of bacteria, with CRISPR-Cas and restriction-modification (RM) systems being the primary known components.

Two recent studies expanded defense analyses in Archaea but focused primarily on the Asgardarchaeota phylum or on specific systems (viperins and argonautes) [18, 19]. Only CRISPR-Cas diversity has been comprehensively mapped across the domain [2022]. Broader evaluations of other systems, including classical ones like restriction-modification, are lacking.

Meanwhile, archaeal genome availability has surged, especially through the recovery of uncultured lineages via environmental sequencing [23, 24], making a comprehensive re-evaluation of archaeal defenses timely. Understanding archaeal immunity is key to reconstructing the evolution of antiviral defense across living organisms, especially given the proposed archaeal ancestry of eukaryotes [25, 26] and evolutionary connections between bacterial and eukaryotic systems [27  -29].

In this study, we analyze 7747 archaeal genomes, the largest and most taxonomically diverse dataset to date, to reassess the distribution, diversity, and evolution of antiviral defense systems in Archaea.

Materials and methods

Data

The GTDB database (accessed on May 5, 2023) was used to retrieve the accessions and metadata for archaeal and bacterial genomes with over 50% completeness and < 20% contamination [24, 30]. From this, we obtained a dataset of 7747 archaeal genomes and 40 000 bacterial genomes (the latter randomly subsampled from the 394 933 available bacterial entries), which were downloaded from NCBI [31] to create the Archaea and Bacteria datasets, respectively (Supplementary Table 1). Proteomes for each genome were retrieved from NCBI when available, or predicted from the genomic sequence using Prodigal v2.6.3 [32]. Supplementary Fig. 1 depicts the taxonomic diversity of the genomes in the Archaea dataset and includes 10 additional 10 GTDB phyla not covered previously [6, 17].

Identification of defense systems

To identify known defense systems in the archaeal and bacterial genomes, we used DefenseFinder v1.2.2 [17], Padloc v2.0.0 [16], and CRISPRCasTyper v1.8.0 [33] with default settings. These tools identify antiviral defense systems by detecting specific protein families through HMM-based homology searches and applying system-architecture rules to group hits into canonical defense operons. For identifying restriction-modification systems based on the REBASE repository (accessed in September 2024) [34], archaeal proteins were matched to the REBASE entries using MMseqs2 (release_15–6f452) [35] with parameters set to –min-seq-id 0.65, –cov_mod 0, and –c 0.8. Proteins were assigned functions and type according to the best match based on e-value scores, resulting in a pool of candidate components for restriction-modification systems. To refine these candidates, we excluded proteins identified as part of non-RM systems by Padloc or DefenseFinder. An in-house script was then used to retrieve the genomic neighborhood of all proteins annotated as restriction enzymes, capturing five genes upstream and five genes downstream of each restriction nucleases. This neighborhood analysis allowed the identification of RM systems meeting specific criteria: Type I RM – presence of type I R, M, and S components; Type II RM – presence of type II R and M components; Type III RM – presence of type III R and M components; Type IV – the presence of a type IV module alone; Type IIG – the presence of a type IIG module alone. A comprehensive list of predicted RM modules is available in Supplementary Table 6. Analyses and graphical visualization of data were carried out using R version 4.3.3 (2024-02-29) [36], RStudio version 2024.04.1 Build 748 [37], and the package ggplot2 v3.4.2 [38]. The final comparison of the archaeal and bacterial immune landscapes was done using the output of Padloc after discarding entries labeled as DNA modification systems (DMS), which denote proteins involved in defense systems that modify DNA but cannot be classified as complete defense systems, and VSPR entries, which are not defense systems. For the identification of archaeal restriction-modification systems, the output of the REBASE-based approach described above was used instead of the PADLOC prediction.

Statistical analysis

Taxonomic prevalence analysis

The prevalence of each defense system across archaea phyla was modelled using binomial generalized linear models. Several systems were rare within individual phyla, and ordinary logistic regression showed signs of data separation and infinite or poorly behaved standard errors. To address this, we used a bias-reduced logistic regression model with a logit link in the brglm2 package (v0.9) [39], which provides finite and less biased estimates under separation. Estimated marginal means were calculated with the emmeans package (v1.11.1) [40] to compare phylum-specific prevalence against the overall archaeal mean. ρ-values were adjusted for multiple tests using the false discovery rate (FDR) method, with significance defined as adjusted P < 0.05 (Supplementary Table 9). Phyla with fewer than 10 genomes were not included in the analysis.

Abundance and diversity of defense systems across domains

Statistical analysis about the abundance and diversity of defense systems in the archaeal and bacterial datasets was done using the vegan package (v.4.1.3) [41]. The Shapiro–Wilk test showed evidence of non-normality for the genome size (W = 0.893, P-value < 2.2 × 10−6 for archaea and W = 0.968, P-value < 2.2 × 10−6 for bacteria), total system counts per genome (W = 0.678, P-value < 2.2 × 10−6 for archaea and W = 0.735, P-value < 2.2 × 10−6 for bacteria) and defense system diversity per genome (W = 0.788, P-value < 2.2 × 10−6 for archaea and W = 0.957, P-value < 2.2 × 10−6 for bacteria) distributions in the archaeal and bacterial datasets.

Effect of genome completeness on defense system abundance

Genome completeness estimates from GTDB metadata used for this analysis are provided in Supplementary Table 1. A filtered dataset of high-quality MAGs (≥90% completeness) was used to investigate the effect of assembly completeness on defense system prevalence for both Bacteria and Archaea (Fig. 2, right panel). To determine if prevalences were different between high-quality domain datasets, 95% confidence intervals (Wilson method) were calculated using the function binom.confint() from package binom (v1.1.1.1) [42]. For each core defense system, a logistic regression model was fitted with presence/absence as the response and genome completeness and domain as predictors using the function glm(). Predicted prevalence and Wald-type 95% confidence intervals were calculated by transforming model predictions from the logit scale to probabilities using the inverse logit function (Supplementary Fig. 6).

Figure 2.

Bubble plot showing the prevalence of the 20 most abundant defense systems in archaeal and bacterial genomes. Two panels compare prevalence in the full genome datasets and in high-quality genomes (≥90% completeness). Circle size and color indicate the percentage of genomes encoding each system. Overall, RM are the most prevalent systems across both domains and most systems are more prevalent in Bacteria. Only CRISPR-Cas, viperin, PT, argonaute and five PDCs are more prevalent in highly-quality archaeal genomes. Archaea have a higher fraction of genomes without detectable defense systems.

Prevalence of the 20 most abundant defense systems in Archaea and Bacteria. The percentage of genomes encoding each defense system is shown in the full datasets (left) and for high-quality (HQ) genomes only (≥90% completeness; right). Circle size and color scale with prevalence and numerical values indicate prevalence percentages. Defense systems highlighted are significantly more prevalent in Archaea than in bacteria in the HQ datasets, based on 95% confidence intervals. The arrow indicates the fraction of genomes with no defense systems detected. Prevalence values are based on PADLOC outputs, except for archaeal RM systems (*), where prevalence was calculated using the REBASE-based approach (Materials and Methods and Fig. 1B).

Contribution of genome size and domain to defense system abundance

Correlation between genome size and defense system abundance and diversity was evaluated using Spearman’s rank correlation. Linear regression models (defense system counts ∼ genome size) were fitted using the lm() function in R (R= 0.218, F(1,47745) = 13 330, P < 0.001). Residuals were extracted using residuals() and compared between domains using a Wilcoxon rank-sum test. Variance partitioning was performed using varpart() function from the vegan package. Defense system counts were normalized by genome size (per Mb) to compare density across domains, and the results are shown in Supplementary Fig. 4.

Optimal growth temperature and archaeal core-defensome abundance

Optimal growth temperature (OGT) for archaeal genomes was estimated using Tome [43], which predicts growth temperature from proteome composition. Genomes were classified as mesophilic (25-≤50°C), thermophilic (≥ 50-<80°C), or hyperthermophilic (≥80°C). Differences in the core defensome abundance across temperature categories were first assessed using the Kruskal–Wallis test, followed by Dunn’s post-hoc test with the Bonferroni correction using the dunn.test package (v1.3.6) [44]. To quantify the relationship between OGT (continuous) and defensome abundance, we fitted negative binomial generalized linear models (NB-GLMs) with a logarithmic link using the MASS package (v7.3.60.22) [45] to account for overdispersion. For RM abundance, total defensome abundance, and system diversity per genome, zero-truncated hurdle negative binomial models were used to account for zero-deflation and overdispersion using the pscl package (v1.5.9) [46]. Nagelkerke’s pseudo-R2 of the negative binomial model was calculated using package performance (v0.15.3) [47]. Model selection and diagnostic checks were performed using the package DHARMa (v.0.4.7) [48]. Corresponding results are shown in Supplementary Fig. 8.

To quantify how individual genomic and ecological variables contribute to defense abundance variation in each domain, we applied the same NB-GLM framework described above. Separate one-predictor models were fitted with log-transformed genome size, phylum, genome completeness, or optimal growth temperature (Archaea only) as predictors, and Nagelkerke’s pseudo-R² was calculated for each predictor. A full model containing all predictors was also fitted to evaluate combined explanatory power (Supplementary Fig. 4E).

Analyses were performed in R (v4.4.0) [36]. All visualizations were generated using the ggplot2 package (v3.4.2) [38]. Figures of composite graphs were generated using the package patchwork (v1.3.0) [49].

Phylogenetic analyses

Components of defense systems identified by Padloc were used for all phylogenetic analyses, with restriction-modification systems, which were analyzed using components of the REBASE-based approach. For analysis of specific defense systems, concatenated DndC and DndD amino acid sequences were used for the DndABCDE phosphorothioation system, PIWI-domain components for argonautes, the AbiEii module for AbiE systems, the cyclase module for CBASS, and the M05A nucleotidyltransferase of PDC-M05. Eukaryotic sequences were obtained from the following sources: eukaryotic viperins from Shomar et al.[19], cGAS-like pattern recognition receptors (cGLRs) from Li et al.[50] and argonautes from Swarts et al. [51].

To reduce sequence redundancy, we clustered the components of each defense system using cd-hit v4.8.1 at a 65% similarity threshold [52]. Amino acid sequences were then aligned with MAFFT v7.505 using the “auto” option [53] and trimmed with TrimAl v1.5.rev0 [54] using the “gappyout” setting. Preliminary phylogenetic trees were constructed with FastTree v2.1.11[55] with parameters –lg and –boot 100. The preliminary trees were pruned for retaining sequence diversity using Treemer v.0.3 [56], resulting in a reduced dataset of sequences for constructing the phylogenetic trees presented in Fig. 5. For these trees, sequences were newly aligned using MAFFT with parameters –maxiterate 1000 and –localpair, and then trimmed with TrimAl using the -gt 0.3 option. Final phylogenetic trees were constructed with IQ-TREE v2.0.7[57] using options –m MFP –bb 1000 –alrt 1000 and -bnni to select the best-fit model using the Model Finder algorithm. The models used were as follows: LG + R6 for AbiE and PDC-S05; LG + F + R6 for SoFic; LG + F + R9 for CBASS; and PDC-S27; LG + F + R8 for pAgo; LG + R + R10 for PT; LG + R10 for viperin; LG + F + R7 for PDC-M05 (subunit M05A); LG + F + R10 for PDC-S01; LG + R7 for PDC-S09; and LG + R9 PDC-S70. Ultrafast boot approximations and approximate likelihood ratio tests with one thousand replicates each to assess branch support. Phylogenetic trees were visualized using ITOL [58]. Trees were rooted at the midpoint, except for viperin and CBASS trees, which were rooted using MoaA and OAS genes, respectively, as an outgroup, following previous analyses [19, 59]. The PDC-S27 tree was rooted using AAA-ATPases (PF00004) as the outgroup.

Figure 5.

Multi-panel figure showing phylogenetic trees for several defense systems that form part of the archaeal core defensome. The trees include homologous sequences from Archaea, Bacteria and Eukaryotes and illustrate the evolutionary relationships of these systems across domains of life. Node markers indicate branches with strong bootstrap support. 

Evolutionary origins of the core-defensome. Phylogenetic trees of systems in the archaeal core-defensome. Branch colors indicate taxonomic classification: blue for bacteria, red for eukaryotes, green for archaea, and pink for Asgard archaea. The argonaute tree was constructed from candidates identified with Padloc, and colored ranges indicate DefenseFinder subtype HMM-profile annotations (long-A, long-B, short, or COG1431_pAgo) when available; leaves without colored ranges correspond to eukaryotic eAgos or Padloc-predicted argonautes not classified by DefenseFinder. Bootstrap support values (≥70%) are marked as dots at the corresponding nodes. Trees for viperins (B) and CBASS (C) were rooted using MoaA and 2′-5′-oligoadenylate synthetase (OAS) sequences, respectively, while other trees were midpoint-rooted (A, D–F).

The amino acid sequences and phylogenetic trees used to make Fig. 5 and Supplementary Fig. 7 are deposited in Supplementary Table 10 and Supplementary Data 1.

Results

Database creation and evaluation of defense system identification tools

We curated a comprehensive database of prokaryotic and metagenomic genomes, including 7747 archaeal and 40 000 bacterial genomes from publicly available sources (Supplementary Table 1) [30, 31]. Bacterial genomes were randomly subsampled from a total of 394 933 genomes available in the Genome Taxonomy Database (GTDB) [30], while all archaeal genomes were included. Most genomes, 92.7% of bacterial genomes and 66.9% archaeal, exceeded >80% completeness and had <5% contamination; the remainder met GTDB’s inclusion criteria of ≥ 50% completeness and < 10% contamination [30]. The taxonomic composition of the archaeal genomes is shown in Supplementary Fig. 1. As closely related strains often encode very different defense systems [6, 17], we retained these archaeal genomes to preserve resolution in system diversity.

To characterize archaeal antiviral immunity and compare it to bacterial systems, we used Padloc [16] and DefenseFinder [17]. In addition, we benchmarked CRISPR-Cas Typer [33] for the prediction of CRISPR-Cas systems, and a homology-based approach based on the REBASE database [34] for the detection of restriction-modification systems (see Methods; Supplementary Tables 26).

To benchmark Padloc and DefenseFinder, we excluded proteins from unpublished putative candidate defense systems (PDCs) in Padloc to ensure fair comparison [15]. DefenseFinder and Padloc identified 150 and 118 systems, respectively (Supplementary Table 7). Overall, 50.9% of bacterial and 31.2% of archaeal defense proteins were identified by both tools (Fig. 1A, Supplementary Table 8). However, each tool also identified unique proteins: Padloc detected 1.6X more unique bacterial and 4.7X more unique archaeal proteins than DefenseFinder, with 56% of archaeal defense proteins uniquely detected by Padloc.

Figure 1.

Multi-panel figure benchmarking archaeal defense system detection and their distribution across prokaryotes. A are Venn diagrams of the performance of Padloc and DefenseFinder on bacterial and archaeal datasets. B and C show the performance of prediction tools for the identification of restriction–modification and CRISPR-Cas systems on archaeal genomes. D plots defense system counts per genome between domains, with archaeal genomes showing a strong skew toward lower system counts per genome. E and F are bar plots of the average number of defense systems counts and defense system families per genome across archaeal phyla, which differ substantially across taxa. G shows the phylum-distribution of archaeal genomes lacking detected defense systems other than restriction–modification and CRISPR-Cas.

Benchmarking the identification of archaeal defense systems. (A) Comparison of the output from Padloc and DefenseFinder on bacterial and archaeal datasets. (B and C) Performance of tools in identifying restriction-modification systems (B) and CRISPR-Cas systems (C) within the archaeal dataset. (D) Distribution of defense systems per genome across prokaryotic domains. The archaeal genome with the highest number of defense systems belongs to the Thermoplasmata class, with 71 systems. In bacteria, members of the Polyangeaceae family (Myxococcota phylum) have the highest count, with a maximum of 167 defense systems. (E and F) Average number of defense systems (E) and defense system families (F) per genome across archaeal phyla. The dashed line indicates the average value across all Archaea. Dots represent the phylum-specific average and error bars indicate the standard deviation. (G) Taxonomic distribution of archaeal genomes without detected defense systems other than RM and CRISPR-Cas, shown by phylum. The dashed line indicates the domain-wide average.

We further evaluated tool performance on RM and CRISPR-Cas system prediction. Padloc and DefenseFinder identified RM systems in 32.9% and 26.5% of archaeal genomes, respectively (Fig. 1B). These values are significantly lower than the estimated 81% prevalence reported in the REBASE database (709 archaeal genomes, October 2024) [34]. Both tools failed to detect many RM systems annotated in genomes listed in REBASE.

To improve detection, we developed a custom RM prediction pipeline based on REBASE. Archaeal proteins were matched to REBASE entries using MMseqs2 (≥65% identity, ≥80% coverage), and functional assignments were based on best hits. We further excluded proteins assigned to other defense systems by Padloc and DefenseFinder and classified RM system types based on gene neighborhood analysis (see Methods; Supplementary Table 6). This approach detected RM systems in 72.8% of archaeal genomes (Fig. 1B), consistent with previous estimates from smaller archaeal genome datasets [34, 60, 61]. Due to its higher sensitivity and agreement with prior benchmarks, we used this REBASE-based approach for all downstream RM analyses.

We also assessed CRISPR-Cas prevalence in Archaea using Padloc, DefenseFinder, and CRISPR-Cas Typer. These tools identified CRISPR-Cas systems 24.5%, 16.9%, and 29.7% of archaeal genomes, respectively (Fig. 1C). The higher detection by CRISPR-Cas Typer was primarily due to putative II-D systems, which are known to be rare in Archaea [21, 22]. Closer inspection revealed that many of these corresponded to OMEGA systems [62], including IscB-HEARO [63] (Supplementary Fig. 2), rather than true type II-D CRISPR-Cas.

In summary, Padloc provided broader coverage (Fig. 1A) and superior detection of RM (Fig. 1B) and CRISPR-Cas (Fig. 1C) than DefenseFinder. Therefore, Padloc was selected for downstream analyses of defense system distribution.

Domain-level overview of the prokaryotic defense landscape

We identified 521 796 occurrences of defense systems (including PDCs) in bacterial genomes and 58 594 systems in archaeal genomes, comprising 1 049 452 genes in total (Supplementary Tables 2 and 6). This domain-wide analysis revealed clear differences in the distribution and diversity of defense systems between Archaea and Bacteria (Fig. 1D).

Defense system abundance deviated from a Gaussian “normal” distribution in both domains, but archaeal genomes showed a strong skew toward lower system counts per genome (Fig. 1D). Overall, 98.8% of bacterial genomes encoded at least one defense system, compared to 87.2% of archaeal genomes. Excluding RM and CRISPR-Cas, these proportions dropped to 97.84% and 68.37%, respectively (Fig. 1G), highlighting a more pronounced absence of known non-RM/CRISPR systems in Archaea.

In total, we detected 269 distinct system types in Bacteria and 197 in Archaea, with archaeal types representing 73.2% of all systems identified. Among these, only one system type, the viperin system associated with tetratricopeptide repeat-domain containing protein, was found exclusively in Archaea, consistent with previous reports [64]. While CRISPR-Cas types III-G and IV-C were only detected with archaeal genomes in our dataset, they are not considered archaeal-specific, and their functional roles as defense systems remain to be experimentally validated. By contrast, 75 defense systems were exclusive to Bacteria, comprising 16 450 occurrences (3.15% of all bacterial hits), underscoring their relative rarity.

Bacterial genomes had an average of 14.4 defense system occurrences and 5.6 distinct types per genome, compared to 6.1 occurrences and 4.2 types in archaeal genomes (Fig. 1E and FSupplementary Fig. 3B). Known defense systems were absent across a broad range of archaeal lineages (Fig. 1G), while their absence in Bacteria was rare (1.2%) and largely limited to taxa with reduced genomes and intracellular lifestyles (Supplementary Fig. 3A), as reported previously[17].

The average genome size was 3.8 Mb for bacteria and 1.8 Mb for archaea, and a substantial fraction of archaeal genomes belong to DPANN lineages (acronym for Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanohaloarchaeota and Nanoarchaeota), organisms with small genomes and cells [65] (Supplementary Fig. 1). This raised the question whether the lower defense content in Archaea is primarily consequence of their smaller genome sizes. Genome size was moderately correlated with defense system abundance in Bacteria (Spearman ρ = 0.476, P-value < 2.2 × 10−6) and weakly correlated in Archaea (Spearman ρ=0.388, P-value = 1.169 × 10−8) (Supplementary Fig. 4A). Linear regression indicated that genome size explained 21.8% of variance in system counts per genome (R²=0.218, F[147 745]=13 330, < 0.001). After accounting for genome size, residual system abundance remained significantly higher in Bacteria than in Archaea (Wilcoxon W = 1.47, < 0.001), indicating that domain contributes additional variation (Supplementary Fig. 4B). Variance partitioning attributed 20% of the variation to genome size and 2% to domain, with 78.3% unexplained (Supplementary Fig. 4C). Normalizing counts per megabase confirmed that system density remained higher in Bacteria (Supplementary Fig. 4D). There results show that genome size partially explains the lower defense content in Archaea, but most variation is not attributable to genome size or domain alone, suggesting additional lineage-specific or ecological influences.

Defense system landscapes differ between Archaea and Bacteria

To compare the immune landscapes of Archaea and Bacteria, we identified the 20 most prevalent defense systems in each domain. This analysis revealed 28 systems in total, representing the most widespread components of the prokaryotic immune repertoire, hereafter referred to as the core defensome (Fig. 2).

Restriction-modification (RM) systems were the most abundant in both domains, found in 71.76% in archaeal and 75.54% bacterial genomes (Fig. 2, left panel). Despite similar prevalence, RM systems constitute a disproportionately large fraction of the archaeal defense repertoire: 38.3% of all archaeal defense proteins compared to 13.8% in Bacteria (Supplementary Fig. 5). This is not explained by higher RM copy number (2.0 vs 2.9 per genome) but likely reflects the reduced representation of other systems in Archaea, increasing RM’s relative contribution.

CRISPR-Cas systems were present in 31.1% of archaeal genomes and 39% of bacterial genomes (Fig. 2, left panel), contrasting sharply with earlier reports of 75–85% prevalence in Archaea [21, 22]. Because archaeal genomes in our dataset had lower average completeness than bacterial ones (85.2% vs 95.6%), we examined whether completeness influences measured defense prevalences. When only genomes with ≥ 90% completeness (3472 archaeal and 34 273 bacterial) were included in the analysis, archaeal prevalence increased most notably for CRISPR-Cas and SoFic, reaching 51.5% and 31.1%, respectively, whereas other archaeal systems were only mildly affected (Fig. 2, right panel and Supplementary Fig. 6). Bacterial prevalences showed comparatively minor shifts.

We further quantified the dependence of prevalence on completeness and domain using logistic regression. From this model, we estimate that CRISPR-Cas prevalence in a dataset with 100% complete genomes would be ∼55% (95% CI: 53.1–56.7%) in Archaea and ∼43% (95% CI: 42.7–43.8%) in Bacteria (Supplementary Fig. 6). Thus, incomplete genomes account for a substantial part, but not all, of the discrepancy with earlier reports [66], and our estimates are consistent with a recent estimate of 52% prevalence of CRISPR-Cas in Archaea [67], published while this work was under review.

Interestingly, six of the ten most prevalent systems are putative defense candidates (PDCs) (Fig. 2), single-gene systems recently identified through a “guilt-by-embedding” approach [15]. Several Hma-embedded candidates (HECs), including HEC-06, have demonstrated antiviral activity in experimental assays[15], but experimental validation is pending for most PDCs in the core defensome. PDC-S01 is the fourth most common defense across all prokaryotes, and five PDCs (PDC-S01, PDC-S27, PDC-S70, PDC-S09, and PDC-M05) are more abundant in Archaea than in Bacteria, highlighting the relevance of PDCs to archaeal immunity and their potential as targets for future characterization.

Among the 15 experimentally validated antiviral defense systems in the core defensome, only viperins are more prevalent in Archaea than in Bacteria, and argonautes and PT showed increased prevalence in Archaea after adjusting for completeness (Fig. 2 and Supplementary Fig. 6). All other validated antiviral systems are more common in Bacteria, suggesting either greater evolutionary diversification in this domain, differences in cellular machineries across domains that may limit system compatibility or operation, or archaeal underrepresentation in current defense models.

Lineage-dependent differences in the archaeal immune pangenome

We analyzed the prevalence of experimentally validated antiviral systems with >3% prevalence across archaeal phyla (≥ 10), namely RM, CRISPR-Cas, SoFic, AbiE, viperin, DNA phosphorothioation, CBASS, and argonaute systems using binomial GLM models fitted to presence/absence data and estimated marginal means to compare each phylum against the overall archaeal mean (Fig. 3 and Supplementary Table 9). These systems show heterogeneous (“patchy”) distribution across archaeal lineages, a hallmark of defense system evolution likely shaped by frequent horizontal gene transfer [17]. Similar patterns are observed for PDCs (Supplementary Fig. 7).

Figure 3.

Bar charts showing the prevalence of core defense systems across archaeal phyla. Each bar represents the percentage of genomes within a phylum encoding a given defense system, with a reference line indicating the average prevalence across all archaeal genomes. The figure highlights differences among phyla, with some showing higher or lower prevalence than the archaeal domain-wide mean. Error bars indicate uncertainty around the estimated prevalence values.

Taxonomic distribution of the archaeal core-defensome. The bar charts represent the percentage of genomes in each phylum containing the defense system. The red dashed line indicates the average prevalence of each defense system across all archaeal genomes. Blue bars indicate a significant difference in the prevalence of a system against the overall archaeal mean (dark blue, overrepresented; light blue, underrepresented). Error bars indicate the confidence intervals.

RM and CRISPR-Cas are underrepresented in DPANN archaea, which instead show enrichment in AbiE, SoFic, and multiple PDCs. This mirrors patterns seen in host-dependent bacteria with small genomes (e.g. Chlamydiota and Patescibacteria), suggesting a selective pressure for compact, single-gene systems in symbiotic or parasitic lineages (Supplementary Figs 3 and 7).

Halobacteriota show broad enrichment for both core and PDC defenses (Fig. 3, Supplementary Fig. 7), with few underrepresented exceptions (e.g. Mokosh, PDC-S07). In contrast, Thermoproteota generally show lower defense prevalence, except for RM, CRISPR-Cas, and argonautes. Thermoplasmatota exhibit low prevalence of CRISPR-Cas, but retain RM, AbiE, viperin, CBASS, and several PDCs (e.g. PDC-S01, PDC-S27, and PDC-S04). Asgardarchaeota genomes are enriched in RM, viperins, CBASS, argonautes, and selected PDCs (S70, S09), but show underrepresentation of SoFic and most other PDCs (Fig. 3, Supplementary Fig. 7).

CRISPR-Cas prevalence also varied across temperature classes. It was the highest in the hyperthermophilic Methanobacteriota_B phylum and in thermophilic classes of the Thermoproteota and Halobacteriota (>50% prevalence) compared to their mesophilic counterparts (10–25% and 0–37%, respectively), and was also enriched in the mesophilic Altiarchaeota (∼50%) (Fig. 4A). In contrast, CRISPR-Cas are notably underrepresented in DPANN archaea, across all temperature categories (Fig. 4A). The distribution of CRISPR-Cas types remains consistent with prior studies: type I and III dominate, while type II and IV are rare, and type VI is absent (Fig. 4B) [21, 22].

Figure 4.

Two-panel figure showing the distribution of CRISPR–Cas systems across archaeal lineages. A presents the prevalence of CRISPR–Cas systems across archaeal phyla and classes, with a reference line indicating the average prevalence across the domain and highlighting lineages with higher or lower prevalence than this mean. Panel B shows the relative abundance of different CRISPR–Cas types across archaeal phyla, indicating variation in the distribution of CRISPR–Cas system types among lineages.

CRISPR-Cas distribution in Archaea. (A) Prevalence of the CRISPR-Cas system across archaeal lineages, with the red line indicating the average prevalence for the domain. Phylum names are in bold, while class-levels are shown in regular font. Lineages highlighted in green have above-average CRISPR-Cas prevalence, while those in yellow have below-average prevalence. Asterisk (*) denotes DPANN lineages. (B) Relative abundance of CRISPR-Cas types across archaeal phyla. Numbers in parentheses indicate the total genomes analyzed for each lineage.

Because optimal growth temperature varies within lineages, we predicted optimal growth temperature (OGT) for all archaeal genomes using Tome and analyzed the relationship between OGT and core system abundance using negative binomial generalized linear models (Supplementary Fig. 8). CRISPR-Cas abundance is positively correlated with increased OGT, with a 35.7% increase in Cas counts per 10°C increment (< 0.001). RM and SoFic show negative relationships with OGT (−13.2% and −42.2% per 10°C, respectively; both < 0.001) (Supplementary Fig. 8). These associations explain ∼13.4–15.4% of the variance (Nagelkerke’s pseudo-R2). Weak but significant negative associations were identified for PT and CBASS, and a positive association for argonautes, although these account for <1% of the explainable variance (Supplementary Fig. 8). These results are consistent with previous reports linking higher CRISPR-Cas abundance to hosts with high optimal growth temperatures and SoFic enrichment in lower-temperature hosts [68].

Next, we quantified the contribution of each factor to defense abundance for each domain using Nagelkerke´s pseudo-R2 (Supplementary Fig. 4E). Genome size explained the largest fraction of variability in both domains, followed by phylum and completeness. Optimal temperature contributed negligibly in Archaea. Even though completeness and temperature contribute smaller effects at the whole-defensome scale, they strongly influence the prevalence of specific systems such as CRISPR-Cas and SoFic, underscoring the importance of system heterogeneity. When all predictors were combined, they explained 38.5% of the variation in Archaea and 36.6% in Bacteria (Supplementary Fig. 4E). These predictors only partially explain the variation in archaeal defense content, yet bacteria encode substantially more defenses and are similarly affected by these predictors. This suggests that other processes, such as HGT, mobile genetic elements, genome reduction, symbiosis, viral pressure, and/or methodological biases, play dominant roles in shaping archaeal immune repertoires.

Evolutionary origins of the prokaryotic core-defensome

While the evolutionary trajectories of RM and CRISPR-Cas systems have been extensively reviewed elsewhere [20, 22, 60, 61, 6973], we focus here on the remaining components of the prokaryotic immune repertoire.

Several innate components in eukaryotes are evolutionarily linked to prokaryotic systems [27]. Although many such systems likely originated in Bacteria, others, such as viperins and argonautes, appear to have archaeal origins, particularly within Asgardarchaeota, the closest relatives of eukaryotes [18]. These systems are especially prevalent in Asgardarchaeota but are also found across other archaeal phyla (Fig. 3).

To explore whether the current distribution of core defense systems reflects vertical inheritance or horizontal gene transfer (HGT), we analyzed the phylogenies of archaeal and bacterial homologs. In agreement with previous studies [51, 59, 64, 74], our trees support archaeal viperins and argonautes as ancestral immune systems, with deep phylogenetic roots (Fig. 5B

and D). Both systems show patchy distribution with multiple inter-domain HGT events, but they are notably overrepresented in Asgardarchaeota, which make up only 3.6% of the archaeal dataset but account for 7.4% of archaeal viperins and 16% of archaeal argonautes (Fig. 3).

Homologs of both systems are also found in eukaryotes [51, 64], and our analysis supports an archaeal origin for eukaryotic argonautes (eAgo) and specifically from long-A argonautes, consistent with previous work [51, 74, 75]. Most asgardarchaeal argonautes fall within the clade from which eAgos appear to have originated (Fig. 5D), agreeing with previous work [18]. Similarly, the largest eukaryotic viperin cluster forms a sister group to a clade of archaeal proteins, including most asgardarchaeal viperins (Fig. 5B and Supplementary Fig. 9). While this supports a likely archaeal origin for eukaryotic viperins, our analysis does not pinpoint a specific contributing phylum.

Our results also align with those of Culbertson and Levin [59], who identified several bacterial-to-eukaryote HGT events involving viperins. However, in contrast to Shomar et al. [19], who reported clear phylogenetic separation between archaeal and bacterial viperins, we observed no strict domain-based division (Fig. 5B and Supplementary Fig. 9).

In contrast, systems such as SoFic, CBASS, AbiE, and the DNA phosphorothioation (Dnd) appear to have a likely bacterial origin, with phylogenies indicating multiple bacteria-to-archaea transfer events (except for PT, displaying a high degree of domain-separation) (Fig. 5 A, C–E). While our phylogenies suggest likely origins for several systems, alternative scenarios, such as vertical inheritance and differential loss, cannot be excluded, even if they are less supported by the current data. Notably, while archaeal AbiE systems share a bacterial ancestor, they tend to cluster separately, suggesting some degree of domain-specificity (Fig. 5D). AbiE is a type IV toxin-antitoxin system known to induce abortive infection by acting on an unknown cellular target [76, 77], and its apparent domain restriction may reflect differences in host-specific targets.

PDCs display diverse evolutionary histories. Systems such as S09, S05, S02, S13, S04, S07, S12 and HEC-06 are predominantly found in bacteria and were likely acquired by Archaea via HGT and show limited subsequent diversification (Supplementary Fig. 10). Others, like S01, S27, S70 and M05 are notably enriched in Archaea, suggesting a possible archaeal origin for these systems (Supplementary Fig. 10).

Among these, PDC-S01 and PDC-S27 are especially abundant in Archaea, accounting for 25% and 43% of their total detected occurrences, respectively. In absolute numbers, we detected 9986 S01 and 3503 S27 systems in Archaea versus 29 494 and 4562 in Bacteria, respectively. PDC-S01 has undergone multiple transfers across domains, followed by some degree of domain-specific diversification (Supplementary Fig. 10). Conversely, S27 appears to have originated in Bacteria and spread into Archaea (Supplementary Fig. 10). Structurally, both encode ATPase domains fused to members of the PD-(D/E)-X-K superfamily, associated with nucleic acid targeting [15].

PDC-M05 is even more archaeal-enriched (59% of all detected instances) (Supplementary Fig. 10). Its architecture, composed of a nucleotidyltransferase and a HEPN-domain protein, is characteristic of known prokaryotic toxin-antitoxin modules [78] [79], although its antiviral role remains to be confirmed. Similarly, PDC-S70, which encodes a PIN nuclease domain, is predominantly archaeal (56% of all counts) and shows deep phylogenetic signal consistent with a possible archaeal origin (Supplementary Fig. 10), followed by multiple subsequent transfers into Bacteria.

Although most PDCs lack experimental validation, their high representation in Archaea and molecular features suggest they are functionally relevant and may represent archaeal contributions to the immune repertoire. These systems are strong candidates for future characterization.

Discussion

Archaea remain the least studied domain of life, including their antiviral strategies. While CRISPR-Cas immunity has been characterized in detail [20, 80], the broader archaeal immune landscape has received little attention, limiting our understanding of the evolution of innate immunity across life.

We present the most taxonomically comprehensive analysis of archaeal antiviral defenses to date and show that Archaea encode fewer and less diverse known defense systems than bacteria, even after accounting for genome completeness and lineage structures. Revised prevalence estimates for CRISPR-Cas (∼50–55%) using this archaeal-enriched dataset indicate that earlier figures based on limited sampling overestimated archaeal CRISPR prevalence and highlight the impact of archaeal underrepresentation and methodological biases in current defense detection frameworks.

A major challenge in benchmarking archaeal immunity lies in the available detection frameworks. Tools such as DefenseFinder and Padloc rely primarily on homology-based searches to identify defense genes. These approaches have been developed using large bacterial datasets, bacterial-centric protein profiles, and antiviral activity validated in models such as E. coli [1113], and their sensitivity and specificity for archaeal homologs remain to be evaluated. Importantly, domain sharing among defense systems could potentially lead to misclassification of unrelated proteins as defense systems, an issue particularly relevant for single-gene systems. However, these tools are currently the most systematic approach for prokaryotic defense system detection and are widely adopted in the field.

Archaeal genomes are also underrepresented in genomic databases (only ∼2% of high-quality assemblies [30]), frequently incomplete, and typically originate from uncultivated or genetically intractable lineages [238183]]. Detection of archaeal homologs of various bacterial and eukaryotic proteins, including, but not limited to, antiviral systems, often requires custom approaches such as novel HMM profiles or structure-guided searches [8490]. Given archaeal-specific features in information processing and cell envelope biology, some antiviral strategies may be mechanistically distinct enough to escape homology-based detection or favor the evolution of new defense systems, further contributing to the apparent disparity between archaeal and bacterial immune landscapes. Therefore, observed archaeal prevalence values should be interpreted conservatively, and comparisons to bacterial systems must account for these detection biases.

Explaining this archaeal-bacterial disparity in defense content remains challenging because genome size, phylum, genome completeness, or temperature only partially account for the observed variation. Ecological factors, particularly viral dynamics, may offer a more compelling explanation. Previous work has shown that environments with high viral abundance and low viral diversity are associated with higher CRISPR-Cas prevalence, whereas phylogenetic relatedness plays a limited role [91]. Symbiotic or parasitic lifestyles may further influence the selection of specific defense systems. However, they are unlikely to fully account for the general bias towards reduced defense content and diversity observed across archaeal lineages.

Despite these limitations, our analysis reveals distinct features of archaeal immunity. Many of the most-prevalent defense systems are single-gene putative defense candidates (PDCs), many of which are not associated with known defense islands [15]. Although PDCs remain unvalidated, their abundance and taxonomic breadth point to a large pool of overlooked systems. Interestingly, three prevalent PDCs (S01, S70, and M05) are ancestral and likely of archaeal origin, rare examples (together with CRISPR-Cas) of putative archaeal-derived contributions to the prokaryotic immune repertoire.

These findings highlight the need for archaeal-adapted discovery strategies. Recent approaches such as regulatory motif mining in archaeal viruses [92], machine-learning [93], and guilt-by-embedding analysis [15] have already uncovered hundreds of new candidate anti-defense genes and defense systems. Moreover, exploring genomic regions enriched in mobile genetic elements (MGEs), beyond classical defense islands, has proven fruitful in both Bacteria [9498] and Archaea [99, 100], and will be essential for capturing the full diversity of archaeal immunity. The discovery of viral inhibitors of archaeal defenses remains limited [101105], further underscoring the need to expand beyond conventional approaches.

Our evolutionary analyses align with previous observations that defense systems evolve primarily through horizontal gene transfer rather than vertical inheritance and reinforce the view that eukaryotic immunity is a mosaic shaped predominantly by bacterial, and to a lesser extent archaeal, contributions. Eukaryotic homologs of viperins and argonautes appear to have originated in Archaea, highlighting the evolutionary relevance of archaeal immunity. While alternative evolutionary scenarios may exist, they are not strongly supported by the current data. Finally, Archaea’s lower global biomass (∼10% of bacterial biomass) [106] may limit both the genomic diversity and the frequency of inter-domain encounters, contributing to the observed bias toward bacteria-to-archaea [107] (and bacteria-to-eukaryote) horizontal transfers, where the latter are thought to underlie the bacterial origins of several innate immune systems in eukaryotes [27, 59].

In conclusion, within the limits of currently recognized defense families and annotation tools, Archaea appear to encode fewer and less diverse known defense systems than Bacteria. Predictors such as genome size, optimal growth temperature, assembly completeness, and taxonomy grouping explain part of the within-Archaea variation, but whether cross-domain disparity reflects true ecological differences, methodological underdetection, or the presence of archaeal-specific strategies remains unclear and will be the focus of future research. Our results challenge the assumption that archaeal and bacterial immune landscapes are broadly similar and emphasize the need for archaeal-adapted tools and validation strategies to uncover the full diversity of archaeal immunity and, by extension, deepen our understanding of immunity across all domains of life.

Supplementary Material

gkag225_Supplemental_Files

Acknowledgements

We acknowledge the support of the EcoCluster initiative at the Department of Biology of the University of Copenhagen for access to resources used for bioinformatics analyses.

Author contributions: L.M-A. conceived the paper, performed the analyses, and wrote the manuscript. X.P. revised and approved the manuscript.

Contributor Information

Laura Martínez-Alvarez, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark.

Xu Peng, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark.

Supplementary data

Supplementary data is available at NAR online.

Conflict of interest

None declared.

Funding

X.P. is supported by the Danish Council for Independent Research/Natural Sciences [DFF-0135-00402 and 10.46540/4264-00120B] and Novo Nordisk Foundation/Hallas Moeller Ascending Investigator Grant [NNF17OC0031154]. Funding for biocomputing resources from the Danish e-Infrastructure Consortium, grant DeiC-KU-N1-2024089 to L.M-A. Funding to pay the Open Access publication charges for this article was provided by the Danish Council for Independent Research/Natural Sciences [DFF-0135-00402 and 10.46540/4264-00120B].

Data availability

Scripts are available at: https://github.com/laura-MtA/Martinez-Alvarez_et.al._2026 and Zenodo at https://doi.org/10.5281/zenodo.18757827.

References

  • 1. Brockhurst  MA, Harrison  E, Hall  JPJ  et al.  The ecology and evolution of pangenomes. Curr Biol. 2019;29:R1094–103. 10.1016/j.cub.2019.08.012. [DOI] [PubMed] [Google Scholar]
  • 2. Frost  LS, Leplae  R, Summers  AO  et al.  Mobile genetic elements: the agents of open source evolution. Nat Rev Micro. 2005;3:722–32. 10.1038/nrmicro1235. [DOI] [PubMed] [Google Scholar]
  • 3. Rocha  EPC, Bikard  D. Microbial defenses against mobile genetic elements and viruses: who defends whom from what?. PLoS Biol. 2022;20:e3001514. 10.1371/journal.pbio.3001514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Koonin  EV, Makarova  KS, Wolf  YI  et al.  Evolutionary entanglement of mobile genetic elements and host defence systems: guns for hire. Nat Rev Genet. 2020;21:119–31. 10.1038/s41576-019-0172-9. [DOI] [PubMed] [Google Scholar]
  • 5. Bernheim  A, Sorek  R. The pan-immune system of bacteria: antiviral defence as a community resource. Nat Rev Micro. 2020;18:113–9. 10.1038/s41579-019-0278-2. [DOI] [PubMed] [Google Scholar]
  • 6. Georjon  H, Bernheim  A. The highly diverse antiphage defence systems of bacteria. Nat Rev Micro. 2023;21:686–700., 10.1038/s41579-023-00934-x. [DOI] [PubMed] [Google Scholar]
  • 7. Mayo-Muñoz  D, Pinilla-Redondo  R, Birkholz  N  et al.  A host of armor: prokaryotic immune strategies against mobile genetic elements. Cell Rep. 2023;42:112672. 10.1016/j.celrep.2023.112672. [DOI] [PubMed] [Google Scholar]
  • 8. Makarova  KS, Wolf  YI, Snir  S  et al.  Defense islands in bacterial and archaeal genomes and prediction of novel defense systems. J Bacteriol. 2011;193:6039–56. 10.1128/JB.05535-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Tesson  F, Bernheim  A. Synergy and regulation of antiphage systems: toward the existence of a bacterial immune system?. Curr Opin Microbiol. 2023;71:102238. 10.1016/j.mib.2022.102238. [DOI] [PubMed] [Google Scholar]
  • 10. Wu  Y, Garushyants  SK, van den Hurk  A  et al.  Bacterial defense systems exhibit synergistic anti-phage activity. Cell Host Microbe. 2024;32:557–572. 10.1016/j.chom.2024.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Doron  S, Melamed  S, Ofir  G  et al.  Systematic discovery of antiphage defense systems in the microbial pangenome. Science. 2018;359:eaar4120. 10.1126/science.aar4120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Gao  L, Altae-Tran  H, Böhning  F  et al.  Diverse enzymatic activities mediate antiviral immunity in prokaryotes. Science. 2020;369:1077–84. 10.1126/science.aba0372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Vassallo  CN, Doering  CR, Littlehale  ML  et al.  A functional selection reveals previously undetected anti-phage defence systems in the E. coli pangenome. Nat Microbiol. 2022;7:1568–79. 10.1038/s41564-022-01219-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Millman  A, Melamed  S, Leavitt  A  et al.  An expanded arsenal of immune systems that protect bacteria from phages. Cell Host Microbe. 2022;30:1556–1569.e5. 10.1016/j.chom.2022.09.017. [DOI] [PubMed] [Google Scholar]
  • 15. Payne  LJ, Hughes  TCD, Fineran  PC  et al. 2024; New antiviral defences are genetically embedded within prokaryotic immune systems. bioRxiv, 10.1101/2024.01.29.577857, 30 January 2024, preprint; not peer reviewed. [DOI]
  • 16. Payne  LJ, Todeschini  TC, Wu  Y  et al.  Identification and classification of antiviral defence systems in bacteria and archaea with PADLOC reveals new system types. Nucleic Acids Res. 2021;49:10868–78. 10.1093/nar/gkab883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Tesson  F, Hervé  A, Mordret  E  et al.  Systematic and quantitative view of the antiviral arsenal of prokaryotes. Nat Commun. 2022;13:2561. 10.1038/s41467-022-30269-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Leão  P, Little  ME, Appler  KE  et al.  Asgard archaea defense systems and their roles in the origin of eukaryotic immunity. Nat Commun. 2024;15:6386. 10.1038/s41467-024-50195-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Shomar  H, Georjon  H, Feng  Y  et al.  Viperin immunity evolved across the tree of life through serial innovations on a conserved scaffold. Nat Ecol Evol. 2024;8:1667–79. 10.1038/s41559-024-02463-z. [DOI] [PubMed] [Google Scholar]
  • 20. Altae-Tran  H, Kannan  S, Suberski  AJ  et al.  Uncovering the functional diversity of rare CRISPR-Cas systems with deep terascale clustering. Science. 2023;382:eadi1910. 10.1126/science.adi1910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Pourcel  C, Touchon  M, Villeriot  N  et al.  CRISPRCasdb a successor of CRISPRdb containing CRISPR arrays and cas genes from complete genome sequences, and tools to download and query lists of repeats and spacers. Nucleic Acids Res. 2020;48:D535–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Makarova  KS, Wolf  YI, Iranzo  J  et al.  Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat Rev Micro. 2020;18:67–83. 10.1038/s41579-019-0299-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Tahon  G, Geesink  P, Ettema  TJG. Expanding archaeal diversity and phylogeny: past, present, and future. Annu Rev Microbiol. 2021;75:359–81. 10.1146/annurev-micro-040921-050212. [DOI] [PubMed] [Google Scholar]
  • 24. Rinke  C, Chuvochina  M, Mussig  AJ  et al.  A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021;6:946–59. 10.1038/s41564-021-00918-8. [DOI] [PubMed] [Google Scholar]
  • 25. Eme  L, Tamarit  D, Caceres  EF  et al.  Inference and reconstruction of the heimdallarchaeial ancestry of eukaryotes. Nature. 2023;618:992–9. 10.1038/s41586-023-06186-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Zaremba-Niedzwiedzka  K, Caceres  EF, Saw  JH  et al.  Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature. 2017;541:353–8. 10.1038/nature21031. [DOI] [PubMed] [Google Scholar]
  • 27. Wein  T, Sorek  R. Bacterial origins of human cell-autonomous innate immune mechanisms. Nat Rev Immunol. 2022;22:629–38. 10.1038/s41577-022-00705-4. [DOI] [PubMed] [Google Scholar]
  • 28. van den Berg  DF, Costa  AR, Esser  JQ  et al.  Bacterial homologs of innate eukaryotic antiviral defenses with anti-phage activity highlight shared evolutionary roots of viral defenses. Cell Host Microbe. 2024;32:1427–1443. 10.1016/j.chom.2024.07.007. [DOI] [PubMed] [Google Scholar]
  • 29. Cury  J, Mordret  E, Trejo  VH  et al.  Conservation of antiviral systems across domains of life reveals novel immune mechanisms in humans. Cell Host Microbe. 2024;32:1594–1607. 10.1101/2022.12.12.520048. [DOI] [PubMed] [Google Scholar]
  • 30. Parks  DH, Chuvochina  M, Rinke  C  et al.  GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 2022;50:D785–94. 10.1093/nar/gkab776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Sayers  EW, Beck  J, Bolton  EE  et al.  Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2024;52:D33–43. 10.1093/nar/gkad1044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Hyatt  D, Chen  G-L, LoCascio  PF  et al.  Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf. 2010;11:119. 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Russel  J, Pinilla-Redondo  R, Mayo-Muñoz  D  et al.  CRISPRCasTyper: automated Identification, Annotation, and Classification of CRISPR-Cas Loci. The CRISPR Journal. 2020;3:462–9. 10.1089/crispr.2020.0059. [DOI] [PubMed] [Google Scholar]
  • 34. Roberts  RJ, Vincze  T, Posfai  J  et al.  REBASE: a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 2023;51:D629–30. 10.1093/nar/gkac975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Steinegger  M, Söding  J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35:1026–8. 10.1038/nbt.3988. [DOI] [PubMed] [Google Scholar]
  • 36. R Core Team . R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2021.https://www.R-project.org/.
  • 37. RStudio Team . RStudio: integrated Development for R. Boston, MA. 2020. http://www.rstudio.com/.
  • 38. Wickham  H. ggplot2: elegant Graphics for Data Analysis Springer-Verlag New York}. 2016.
  • 39. Kosmidis  I, Pagui  ECK, Konis  K  et al.  brglm2: bias reduction in generalized linear models.2023.
  • 40. Lenth  RV, Banfai  B, Bolker  B  et al.  emmeans: estimated marginal means, aka least-squares means. 2025.
  • 41. Oksanen  J, Simpson  GL, Blanchet  FG  et al.  vegan: community Ecology Package. 2024.
  • 42. Dorai-Raj  S. binom: binomial confidence intervals for several parameterizations. 2020.
  • 43. Li  G, Rabe  KS, Nielsen  J  et al.  Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima. ACS Synth Biol. 2019;8:1411–20. 10.1021/acssynbio.9b00099. [DOI] [PubMed] [Google Scholar]
  • 44. Dinno  A. dunn.test: dunn’s Test of Multiple Comparisons Using Rank Sums. 2024.
  • 45. Venables  WN, Ripley  BD  Modern Applied Statistics with S, 4th edn. New York: Springer International Publishing. [Google Scholar]
  • 46. Zeileis  A, Kleiber  C, Jackman  S. Regression models for count data in R. J Stat Soft. 2008;27:1–25. 10.18637/jss.v027.i08. [DOI] [Google Scholar]
  • 47. Lüdecke  D, Ben-Shachar  MS, Patil  I  et al.  performance: an R Package for assessment, comparison and testing of statistical models. J Open Source Software. 2021;6:3139. [Google Scholar]
  • 48. Hartig  F. DHARMa: residual diagnostics for hierarchical (multi-level /mixed) regression models. R package version 0.4.7.. 2024. https://CRAN.R-project.org/package=DHARMa.
  • 49. Pedersen  TL. patchwork: the composer of plots. 2024.
  • 50. Li  Y, Slavik  KM, Toyoda  HC  et al.  cGLRs are a diverse family of pattern recognition receptors in innate immunity. Cell. 2023;186:3261–3276. 10.1016/j.cell.2023.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Swarts  DC, Jore  MM, Westra  ER  et al.  DNA-guided DNA interference by a prokaryotic Argonaute. Nature. 2014;507:258–61. 10.1038/nature12971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Fu  L, Niu  B, Zhu  Z  et al.  CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–2. 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Katoh  K, Standley  DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80. 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Capella-Gutiérrez  S, Silla-Martínez  JM, Gabaldón  T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3. 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Price  MN, Dehal  PS, Arkin  AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490. 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Menardo  F, Loiseau  C, Brites  D  et al.  Treemmer: a tool to reduce large phylogenetic datasets with minimal loss of diversity. BMC Bioinf. 2018;19:164. 10.1186/s12859-018-2164-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Minh  BQ, Schmidt  HA, Chernomor  O  et al.  IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Letunic  I, Bork  P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–6. 10.1093/nar/gkab301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Culbertson  EM, Levin  TC. Eukaryotic CD-NTase, STING, and viperin proteins evolved via domain shuffling, horizontal transfer, and ancient inheritance from prokaryotes. PLoS Biol. 2023;21:e3002436. 10.1371/journal.pbio.3002436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Oliveira  PH, Touchon  M, Rocha  EPC. The interplay of restriction-modification systems with mobile genetic elements and their prokaryotic hosts. Nucleic Acids Res. 2014;42:10618–31. 10.1093/nar/gku734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Fullmer  MS, Ouellette  M, Louyakis  AS  et al.  The patchy distribution of restriction–modification system genes and the conservation of orphan methyltransferases in halobacteria. Genes. 2019;10:233. 10.3390/genes10030233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Altae-Tran  H, Kannan  S, Demircioglu  FE  et al.  The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases. Science. 2021;374:57–65. 10.1126/science.abj6856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Aliaga Goltsman  DS, Alexander  LM, Lin  J-L  et al.  Compact Cas9d and HEARO enzymes for genome editing discovered from uncultivated microbes. Nat Commun. 2022;13:7602. 10.1038/s41467-022-35257-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Bernheim  A, Millman  A, Ofir  G  et al.  Prokaryotic viperins produce diverse antiviral molecules. Nature. 2021;589:120–4. 10.1038/s41586-020-2762-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Dombrowski  N, Lee  J-H, Williams  TA  et al.  Genomic diversity, lifestyles and evolutionary origins of DPANN archaea. FEMS Microbiol Lett. 2019;366:fnz008. 10.1093/femsle/fnz008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Federhen  S. The NCBI Taxonomy database. Nucleic Acids Res. 2012;40:D136–43. 10.1093/nar/gkr1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Makarova  KS, Wolf  YI, Alkhnbashi  OS  et al.  An updated evolutionary classification of CRISPR-Cas systems. Nat Rev Micro. 2015;13:722–36. 10.1038/nrmicro3569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Olijslager  LH, Weijers  D, Swarts  DC. Distribution of specific prokaryotic immune systems correlates with host optimal growth temperature. NAR Genom Bioinform. 2024;6:lqae105. 10.1093/nargab/lqae105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Bujnicki  JM, Rychlewski  L, Radlinska  M. Polyphyletic evolution of type II restriction enzymes revisited: two independent sources of second-hand folds revealed. Trends Biochem Sci. 2001;26:9–11. 10.1016/S0968-0004(00)01690-X. [DOI] [PubMed] [Google Scholar]
  • 70. Pingoud  A, Wilson  GG, Wende  W. Type II restriction endonucleases—a historical perspective and more. Nucleic Acids Res. 2014;42:7489–527. 10.1093/nar/gku447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Oliveira  PH, Touchon  M, Rocha  EPC. Regulation of genetic flux between bacteria by restriction–modification systems. Proc Natl Acad Sci USA. 2016;113:5658–63. 10.1073/pnas.1603257113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Oliveira  PH, Fang  G. Conserved DNA methyltransferases: a window into fundamental mechanisms of epigenetic regulation in bacteria. Trends Microbiol. 2021;29:28–40. 10.1016/j.tim.2020.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Makarova  KS, Wolf  YI, Koonin  EV. Evolutionary Classification of CRISPR-Cas Systems. In: Barrangou  R, Sontheimer  EJ, Marrafini  LA (ed.), CRISPR: Biology and Applications. John Wiley & Sons, Inc, 2022, 13–3. 10.1002/9781683673798.ch2. [DOI] [Google Scholar]
  • 74. Makarova  KS, Wolf  YI, van der Oost  J  et al.  Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements. Biol Direct. 2009;4:29. 10.1186/1745-6150-4-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Koopal  B, Mutte  SK, Swarts  DC. A long look at short prokaryotic Argonautes. Trends Cell Biol. 2023;33:605–18. 10.1016/j.tcb.2022.10.005. [DOI] [PubMed] [Google Scholar]
  • 76. Dy  RL, Przybilski  R, Semeijn  K  et al.  A widespread bacteriophage abortive infection system functions through a Type IV toxin-antitoxin mechanism. Nucleic Acids Res. 2014;42:4590–605. 10.1093/nar/gkt1419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Chopin  M-C, Chopin  A, Bidnenko  E. Phage abortive infection in lactococci: variations on a theme. Curr Opin Microbiol. 2005;8:473–9. 10.1016/j.mib.2005.06.006. [DOI] [PubMed] [Google Scholar]
  • 78. Anantharaman  V, Makarova  KS, Burroughs  AM  et al.  Comprehensive analysis of the HEPN superfamily: identification of novel roles in intra-genomic conflicts, defense, pathogenesis and RNA processing. Biol Direct. 2013;8:15. 10.1186/1745-6150-8-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Songailiene  I, Juozapaitis  J, Tamulaitiene  G  et al.  HEPN-MNT toxin-antitoxin system: the HEPN ribonuclease is neutralized by OligoAMPylation. Mol Cell. 2020;80:955–970. 10.1016/j.molcel.2020.11.034. [DOI] [PubMed] [Google Scholar]
  • 80. Zink  IA, Wimmer  E, Schleper  C. Heavily armed ancestors: CRISPR immunity and applications in archaea with a comparative analysis of CRISPR types in sulfolobales. Biomolecules. 2020;10:1523. 10.3390/biom10111523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Baker  BJ, De Anda  V, Seitz  KW  et al.  Diversity, ecology and evolution of Archaea. Nat Microbiol. 2020;5:887–900. 10.1038/s41564-020-0715-z. [DOI] [PubMed] [Google Scholar]
  • 82. Lewis  WH, Tahon  G, Geesink  P  et al.  Innovations to culturing the uncultured microbial majority. Nat Rev Micro. 2021;19:225–40. 10.1038/s41579-020-00458-8. [DOI] [PubMed] [Google Scholar]
  • 83. Harrison  C, Allers  T. Progress and challenges in archaeal genetic manipulation. Methods Mol Biol. 2022;2522:25–31. [DOI] [PubMed] [Google Scholar]
  • 84. Schlesner  M, Miller  A, Streif  S  et al.  Identification of Archaea-specific chemotaxis proteins which interact with the flagellar apparatus. BMC Microbiol. 2009;9:56. 10.1186/1471-2180-9-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. Makarova  KS, Wolf  YI, Karamycheva  S  et al.  Antimicrobial peptides, polymorphic toxins, and self-nonself recognition systems in archaea: an untapped armory for intermicrobial conflicts. mBio. 2019;10:e00715–19. 10.1128/mBio.00715-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Chamieh  H, Ibrahim  H, Kozah  J. Genome-wide identification of SF1 and SF2 helicases from archaea. Gene. 2016;576:214–28. 10.1016/j.gene.2015.10.007. [DOI] [PubMed] [Google Scholar]
  • 87. Batista  M, Langendijk-Genevaux  P, Kwapisz  M  et al.  Evolutionary and functional insights into the Ski2-like helicase family in Archaea: a comparison of Thermococcales ASH-Ski2 and Hel308 activities. NAR Genom Bioinform. 2024;6:lqae026. 10.1093/nargab/lqae026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88. Liu  J, Tassinari  M, Souza  DP  et al.  Bacterial Vipp1 and PspA are members of the ancient ESCRT-III membrane-remodeling superfamily. Cell. 2021;184:3660–3673. 10.1016/j.cell.2021.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89. Moi  D, Nishio  S, Li  X  et al.  Discovery of archaeal fusexins homologous to eukaryotic HAP2/GCS1 gamete fusion proteins. Nat Commun. 2022;13:3880. 10.1038/s41467-022-31564-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90. Makarova  KS, Aravind  L, Wolf  YI  et al.  Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems. Biol Direct. 2011;6:38. 10.1186/1745-6150-6-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91. Meaden  S, Biswas  A, Arkhipova  K  et al.  High viral abundance and low diversity are associated with increased CRISPR-Cas prevalence across microbial ecosystems. Curr Biol. 2022;32:220–227. 10.1016/j.cub.2021.10.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Bhoobalan-Chitty  Y, Xu  S, Martinez-Alvarez  L  et al.  Regulatory sequence-based discovery of anti-defense genes in archaeal viruses. Nat Commun. 2024;15:3699. 10.1038/s41467-024-48074-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. DeWeirdt  PC, Mahoney  EM, Laub  MT  2025; DefensePredictor: a machine learning model to discover novel prokaryotic immune systems. 10.1101/2025.01.08.631726. bioRxiv, 08 January 2025, preprint: not peer reviewed. [DOI]
  • 94. Rousset  F, Depardieu  F, Miele  S  et al.  Phages and their satellites encode hotspots of antiviral systems. Cell Host Microbe. 2022;30:740–753. 10.1016/j.chom.2022.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95. LeGault  KN, Barth  ZK, DePaola  P  et al.  A phage parasite deploys a nicking nuclease effector to inhibit viral host replication. Nucleic Acids Res. 2022;50:8401–17. 10.1093/nar/gkac002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Fillol-Salom  A, Rostøl  JT, Ojiogu  AD  et al.  Bacteriophages benefit from mobilizing pathogenicity islands encoding immune systems against competitors. Cell. 2022;185:3248–3262. 10.1016/j.cell.2022.07.014. [DOI] [PubMed] [Google Scholar]
  • 97. Dedrick  RM, Jacobs-Sera  D, Bustamante  CAG  et al.  Prophage-mediated defence against viral attack and viral counter-defence. Nat Microbiol. 2017;2:1–13. 10.1038/nmicrobiol.2016.251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Pinilla-Redondo  R, Russel  J, Mayo-Muñoz  D  et al.  CRISPR-Cas systems are widespread accessory elements across bacterial and archaeal plasmids. Nucleic Acids Res. 2022;50:4315–28. 10.1093/nar/gkab859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Peng  X, Brügger  K, Shen  B  et al.  Genus-specific protein binding to the large clusters of DNA repeats (short regularly spaced repeats) present in Sulfolobus genomes. J Bacteriol. 2003;185:2410–7. 10.1128/JB.185.8.2410-2417.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100. Liu  G, She  Q, Garrett  RA. Diverse CRISPR-Cas responses and dramatic cellular DNA changes and cell death in pKEF9-conjugated Sulfolobus species. Nucleic Acids Res. 2016;44:4233–42. 10.1093/nar/gkw286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101. Lin  J, Fuglsang  A, Kjeldsen  AL  et al.  DNA targeting by subtype I-D CRISPR-Cas shows type I and type III features. Nucleic Acids Res. 2020;48:10470–8. 10.1093/nar/gkaa749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102. He  F, Vestergaard  G, Peng  W  et al.  CRISPR-Cas type I-A Cascade complex couples viral infection surveillance to host transcriptional regulation in the dependence of Csa3b. Nucleic Acids Res. 2016;45:gkw1265. 10.1093/nar/gkw1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103. Lin  J, Alfastsen  L, Bhoobalan  Y  et al.  Molecular basis for inhibition of type Iii-B crispr-cas by an archaeal viral anti-crispr protein. 2023. 10.2139/ssrn.4375100. [DOI] [PubMed]
  • 104. Bhoobalan-Chitty  Y, Johansen  TB, Di Cianni  N  et al.  Inhibition of type III CRISPR-Cas immunity by an archaeal virus-encoded anti-CRISPR protein. Cell. 2019;179:448–458. 10.1016/j.cell.2019.09.003. [DOI] [PubMed] [Google Scholar]
  • 105. Zhang  Z, Pan  S, Liu  T  et al.  Cas4 nucleases can effect specific integration of CRISPR spacers. J Bacteriol. 2019;201:e00747–18. 10.1128/JB.00747-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106. Bar-On  YM, Phillips  R, Milo  R. The biomass distribution on Earth. Proc Natl Acad Sci USA. 2018;115:6506–11. 10.1073/pnas.1711842115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107. López-García  P, Zivanovic  Y, Deschamps  P  et al.  Bacterial gene import and mesophilic adaptation in archaea. Nat Rev Micro. 2015;13:447–56. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkag225_Supplemental_Files

Data Availability Statement

Scripts are available at: https://github.com/laura-MtA/Martinez-Alvarez_et.al._2026 and Zenodo at https://doi.org/10.5281/zenodo.18757827.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES